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Preface 


A good part of matrix theory is functional analytic in spirit. This statement 
can be turned around. There are many problems in operator theory, where 
most of the complexities and subtleties are present in the finite-dimensional 
case. My purpose in writing this book is to present a systematic treatment 
of methods that are useful in the study of such problems. 

This book is intended for use as a text for upper division and gradu- 
ate courses. Courses based on parts of the material have been given by 
me at the Indian Statistical Institute and at the University of Toronto (in 
collaboration with Chandler Davis). The book should also be useful as a 
reference for research workers in linear algebra, operator theory, mathe- 
matical physics and numerical analysis. 

A possible subtitle of this book could be Matrix Inequalities. A reader 
who works through the book should expect to become proficient in the art 
of deriving such inequalities. Other authors have compared this art to that 
of cutting diamonds. One first has to acquire hard tools and then learn how 
to use them delicately. 

The reader is expected to be very thoroughly familiar with basic lin- 
ear algebra. The standard texts Finite-Dimensional Vector Spaces by P.R. 
Halmos and Linear Algebra by K. Hoffman and R. Kunze provide adequate 
preparation for this. In addition, a basic knowledge of functional analy- 
sis, complex analysis and differential geometry is necessary. The usual first 
courses in these subjects cover all that is used in this book. 

The book is divided, conceptually, into three parts. The first five chapters 
contain topics that are basic to much of the subject. (Of these, Chapter 5 
is more advanced and also more special.) Chapters 6 to 8 are devoted to 
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perturbation of spectra, a topic of much importance in numerical analysis, 
physics and engineering. The last two chapters contain inequalities and 
perturbation bounds for other matrix functions. These too have been of 
broad interest in several areas. 

In Chapter 1, I have given a very brief and rapid review of some basic 
topics. The aim is not to provide a crash course but to remind the reader 
of some important ideas and theorems and to set up the notations that are 
used in the rest of the book. The emphasis, the viewpoint, and some proofs 
may be different from what the reader has seen earlier. Special attention 
is given to multilinear algebra; and inequalities for matrices and matrix 
functions are introduced rather early. After the first chapter, the exposition 
proceeds at a much more leisurely pace. The contents of each chapter have 
been summarised in its first paragraph. 

The book can be used for a variety of graduate courses. Chapters 1 
to 4 should be included in any course on Matrix Analysis. After this, if 
perturbation theory of spectra is to be emphasized, the instructor can go 
on to Chapters 6,7 and 8. With a judicious choice of topics from these 
chapters, she can design a one-semester course. For example, Chapters 7 
and 8 are independent of each other, as are the different sections in Chapter 
8. Alternately, a one-semester course could include much of Chapters 1 
to 5, Chapter 9, and the first part of Chapter 10. All topics could be 
covered comfortably in a two-semester course. The book can also be used 
to supplement courses on operator theory, operator algebras and numerical 
linear algebra. The book has several exercises scattered in the text and a 
section called Problems at the end of each chapter. An ezercise is placed at a 
particular spot with the idea that the reader should do it at that stage of his 
reading and then proceed further. Problems, on the other hand, are designed 
to serve different purposes. Some of them are supplementary exercises, 
while others are about themes that are related to the main development in 
the text. Some are quite easy while others are hard enough to be contents 
of research papers. From Chapter 6 onwards, I have also used the problems 
for another purpose. There are results, or proots, which are a bit too special 
to be placed in the main text. At the same time they are interesting enough 
to merit the attention of anyone working, or planning to work, in this area. 
I have stated such results as parts of the Problems section, often with 
hints about their solutions. This should enhance the value of the book as 
a reference, and provide topics for a seminar course as well. The reader 
should not be discouraged if he finds some of these problems difficult. At a 
few places I have drawn attention to some unsolved research problems. At 
some others, the existence of such problems can be inferred from the text. 
I hope the book will encourage some readers to solve these problems too. 

While most of the notations used are the standard ones, some need a 
little explanation: | 

Almost all functional analysis books written by mathematicians adopt 
the convention that an inner product (u,v) is linear in the variable u and 
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conjugate-linear in the variable v. Physicists and numerical analysts adopt 
the opposite convention, and different notations as well. There would be no 
special reason to prefer one over the other, except that certain calculations 
and manipulations become much simpler in the latter notation. If u and v 
are column vectors, then u*v is the product of a row vector and a column 
vector, hence a number. This is the inner product of u and v. Combined 
with the usual rules of matrix multiplication, this facilitates computations. 
For this reason, I have chosen the second convention about inner products, 
with the belief that the initial discomfort this causes some readers will be 
offset by the eventual advantages. (Dirac’s bra and ket notation, used by 
physicists, is different typographically but has the same idea behind it.) 

The k-fold tensor power of an operator is represented in this book as 
@* A, the antisymmetric and the symmetric tensor powers as \*.A and V*A, 
respectively. This helps in thinking of these objects as maps, A —> @* A, 
etc. We often study the variational behaviour of, and perturbation bounds 
for, functions of operators. In such contexts, this notation is natural. 

Very often we have to compare two n-tuples of numbers after rearrang- 
ing them. For this I have used a pictorial notation that makes it easy to 
remember the order that has been chosen. If = (z,... , Zn) is a vector 
with real coordinates, then x! and x! are vectors whose coordinates are ob- 
tained by rearranging the numbers z; in decreasing order and in increasing 
order, respectively. We write a! = (xt, ..., c+) anda! = (x!, ...,2)), 
where zi >--->at anda! <...<al. 

The symbol ||| - || stands for a unitarily invariant norm on matrices: one 
that satisfies the equality ||[U.AV||| = |||Al|| for all A and for all unitary 
U,V. A statement like ||A|l| < ||B|| means that, for the matrices A and B, 
this inequality is true simultaneously for all unitarily invariant norms. The 
supremum norm of A, as an operator on the space C”, is always written 
as ||A||. Other norms carry special subscripts. For example, the Frobenius 
norm, or the Hilbert-Schmidt norm, is written as ||A||2. (This should be 
noted by numerical analysts who often use the symbol ||A||2 for what we 
call ||Aj|.) 

A few symbols have different meanings in different contexts. The reader’s 
attention is drawn to three such symbols. If x is a complex number, |x| de- 
notes the absolute value of x. If z is an n-vector with coordinates (x1,..., Zn) 
then |x| is the vector (|r|,...,|%n,|). For a matrix A, the symbol |A| stands 
for the positive semidefinite matrix (A*A)!/?. If J is a finite set, |.J| denotes 
the number of elements of J. A permutation on n indices is often denoted 
by the symbol o. In this case, o(j) is the image of the index 7 under the 
map o. For a matrix A,o(A) represents the spectrum of A. The trace of a 
matrix A is written as tr A. In analogy, if x = (%1,...,2n) is a vector, we 
write tr x for the sum Uz,. 

The words matrix and operator are used interchangeably in the book. 
When a statement about an operator is purely finite-dimensional in content, 
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I use the word matrix. If a statement is true also in infinite-dimensional 
spaces, possibly with a small modification, I use either the word matrix or 
the word operator. Many of the theorems in this book have extensions to 
infinite-dimensional spaces. 

Several colleagues have contributed to this book, directly and indirectly. I 
am thankful to all of them. T. Ando, J.S. Aujla, R.B. Bapat, A. Ben Israel, 
I. Ionascu, A.K. Lal, R.-C.Li, S.K. Narayan, D. Petz and P. Rosenthal read 
parts of the manuscript and brought several errors to my attention. Fumio 
Hiai read the whole book with his characteristic meticulous attention and 
helped me eliminate many mistakes and obscurities. Long-time friends and 
coworkers M.D. Choi, L. Elsner, J.A.R. Holbrook, R. Horn, F. Kittaneh, 
A. McIntosh, K. Mukherjea, K.R. Parthasarathy, P. Rosenthal and K.B. 
Sinha, have generously shared with me their ideas and insights. These ideas, 
collected over the years, have influenced my writing. 

I owe a special debt to T. Ando. I first learnt some of the topics presented 
here from his Hokkaido University lecture notes. I have also learnt much 
from discussions and correspondence with him. I have taken a lot from his 
notes while writing this book. 

The idea of writing this book came from Chandler Davis in 1986. Various 
logistic difficulties forced us to abandon our original plans of writing it 
together. The book is certainly the poorer for it. Chandler, however, has 
contributed so much to my mathematics, to my life, and to this project, 
that this is as much his book as it is mine. 

I am thankful to the Indian Statistical Institute, whose facilities have 
made it possible to write this book. I am also thankful to the Department 
of Mathematics of the University of Toronto and to NSERC Canada, for 
several visits that helped this project take shape. 

It is a pleasure to thank V.P. Sharma for his IA4TpXtyping, done with 
competence and with good cheer, and the staff at Springer-Verlag for their 
help and support. 

My most valuable resource while writing, has been the unstinting and 
ungrudging support from my son Gautam and wife Irpinder. Without that, 
this project might have been postponed indefinitely. 


Rajendra Bhatia 
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I 
A Review of Linear Algebra 


In this chapter we review, at a brisk pace, the basic concepts of linear and 
multilinear algebra. Most of the material will be familiar to a reader who 
has had a standard Linear Algebra course, so it is presented quickly with 
no proofs. Some topics, like tensor products, might be less familiar. These 
are treated here in somewhat greater detail. A few of the topics are quite 
advanced and their presentation is new. 


I.1 Vector Spaces and Inner Product Spaces 


Throughout this book we will consider finite-dimensional vector spaces over 
the field C of complex numbers. Such spaces will be denoted by symbols 
V,W, WM, Vo, etc. Vectors will, most often, be represented by symbols u, v, 
w, x, etc., and scalars by a, b, s, t, etc. The symbol n, when not explained, 
will always mean the dimension of the vector space under consideration. 

Most often, our vector space will be an inner product space. ‘The inner 
product between the vectors u,v will be denoted by (u,v). We will adopt 
the convention that this is conjugate-linear in the first variable u and linear 
in the second variable v. We will always assume that the inner product is 
definite; ie., (u,u) = 0 if and only if u = 0. A vector space with such 
an inner product is then a finite-dimensional Hilbert space. Spaces of this 
type will be denoted by symbols H, K, etc. The norm arising from the inner 
product will be denoted by |lul|; i-e., |]ul] = (a, u)?/?. 

As usual, it will sometimes be convenient to deal with the standard 
Hilbert space C”. Elements of this vector space are column vectors with 
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n coordinates. In this case, the inner product (u,v) is the matrix product 
u*v obtained by multiplying the column vector v on the left by the row 
vector u*. The symbol * denotes the conjugate transpose for matrices of 
any size. The notation u*v for the inner product is sometimes convenient 
even when the Hilbert space is not C”. 

The distinction between column vectors and row vectors is important in 
manipulations involving products. For example, if we write elements of C” 
as column vectors, then u*v is a number, but uv* is an n x n matrix (some- 
times called the “outer product” of u and v). However, it is typographically 
inconvenient to write column vectors. So, when the context does not de- 
mand this distinction, we may write a vector x with scalar coordinates 
Z1,---,Xn, Simply as (21,...,2%,). This will often be done in later chap- 
ters. For the present, however, we will maintain the distinction between 
row and column vectors. 

Occasionally our Hilbert spaces will be real, but we will use the same 
notation for them as for the complex ones. Many of our results will be true 
for infinite-dimensional Hilbert spaces, with appropriate modifications at 
times. We will mention this only in passing. 

Let X = (21,...,2,%) be a k-tuple of vectors. If these are column vectors, 
then X is an n x k matrix. This notation suggests matrix manipulations 
with X that are helpful even in the general case. 

For example, let X = (2,... ,t%,) be a linearly independent k-tuple. We 
say that a k-tuple Y = (yi,...,y,) is biorthogonal to X if (Yi, L5) = 6;5. 
This condition is expressed in matrix terms as Y*X = x, the k x k identity 
matrix. 


Exercise I.1.1 Given any k-tuple of linearly independent vectors X as 
above, there exists a k-tuple Y biorthogonal to it. Ifk =n, this Y is unique. 


The Gram-Schmidt procedure, in this notation, can be interpreted as a 
matrix factoring theorem. Given an n-tuple X = (21,...,2n) of linearly 
independent vectors the procedure gives another n-tuple Q = (q1,.--,Qn) 
whose entries are orthonormal vectors. For each k = 1,2,...,n, the vectors 
{1,...,¢,} and {q,... : 4x} have the same linear span. In matrix notation 
this can be expressed as an equation, X = QR, where R is an upper 
triangular matrix. The matrix R may be chosen so that all its diagonal 
entries are positive. With this restriction the factors @ and R are both 
unique. If the vectors x; are not linearly independent, this procedure can 
be modified. If the vector x; is linearly dependent on L1,---,Le-1, set 
dx = 0; otherwise proceed as in the Gram-Schmidt process. If the kth 
column of the matrix Q so constructed is zero, put the kth row of R to be 
zero. Now we have a factorisation X = QR, where R is upper triangular 
and @ has orthogonal columns, some of which are zero. Take the nonzero 
columns of @ and extend this set to an orthonormal basis. Then, replace 
the zero columns of Q by these additional basis vectors. The new matrix 
() now has orthonormal columns, and we still have X = QR, because the 
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new columns of @ are matched with zero rows of R. This is called the QR 
decomposition. 

Similarly, a change of orthogonal bases can be conveniently expressed in 
these notations as follows. Let X = (x,...,2,%) be any k-tuple of vectors 
and E = (e€1,...,€n) any orthonormal basis. Then, the columns of the 
matrix &*X are the representations of the vectors comprising X, relative 
to the basis E. When k = n and X is an orthonormal basis, then E*X is a 
unitary matrix. Furthermore, this is the matrix by which we pass between 


coordinates of any vector relative to the basis EF and those relative to the 
basis X. Indeed, if 


U= 1€, +++ + Gnen = 012, +--- + dnrn, 


then we have 


Hence, 
a=E*Xb and b=X*Ea. 


Exercise [.1.2 Let X be any basis of H and let Y be the basis biorthogonal 
to it. Using matrix multiplication, X gives a linear transformation from 
C” to H. The inverse of this is given by Y*. In the special case when 
X is orthonormal (so that Y = X ), this transformation is inner-product- 
preserving if the standard inner product is used on C”. 


Exercise I.1.3 Use the QR decomposition to prove Hadamard’s inequal- 
ity: if X = (%,...,2%n), then 


|det X| < | [Ilzll- 


j=l 


Equality holds here if and only if either the x; are mutually orthogonal or 
some XL; 18 zero. 


[.2 Linear Operators and Matrices 


Let L(V, W) be the space of all linear operators from a vector space V to 
a vector space W. If bases for V,W are fixed, each such operator has a 
unique matrix associated with it. As usual, we will talk of operators and 
matrices interchangeably. 

For operators between Hilbert spaces, the matrix representations are 
especially nice if the bases chosen are orthonormal. Let A € L(H, kK), and 
let E = (e€1,...,€n) be an orthonormal basis of H and fF’ = (fi,---,fm) an 
orthonormal basis of K. Then, the (i,7)-entry of the matrix of A relative 
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to these bases is a;; = f, Ae; = (fi, Ae;). This suggests that we may say 
that the matrix of A relative to these bases is F* AE. 

In this notation, composition of linear operators can be identified with 
matrix multiplication as follows. Let M be a third Hilbert space with or- 
thonormal basis G = (g1,...,gp). Let B € L(K, M). Then 


(matrix of B - A) G"(B-A)E 

G* BF F*" AE 

(G* BF )(F* AE) 

(matrix of B) (matrix of A). 


‘The second step in the above chain is justified by Exercise I.1.2. 


The adjoint of an operator A € L(H,K) is the unique operator A* in 
L(K, 7H) that satisfies the relation 


(z, AZ) = (A*z,2)4 
for alla € Hand zeEK. 


Exercise 1.2.1 For fized bases in H and K, the matrix of A* is the con- 
jugate transpose of the matriz of A. 


For the space £L(H,#) we use the more compact notation L(A). In the 
rest of this section, and elsewhere in the book, if no qualification is made, 
an operator would mean an element of £L(H). 

An operator A is called self-adjoint or Hermitian if A = A*, skew- 
Hermitian if A = —A*, unitary if AA* = J = A* A, and normal if 
AA* = A*A. 

A Hermitian operator A is said to be positive or positive semidefinite 
if (x, Ax) > 0 for all z € H. The notation A > 0 will be used to express 
the fact that A is a positive operator. If (z, Az) > 0 for all nonzero x, we 
will say A is positive definite, or strictly positive . We will then write 
A > 0. A positive operator is strictly positive if and only if it is invertible. 
If A and B are Hermitian, then we say A> Bif A—B > 0. 

Given any operator A we can find an orthonormal basis Y1,---;Yn Such 
that for each k = 1,2,...,n, the vector Ay, is a linear combination of 
Y1,---,Yx- This can be proved by induction on the dimension n of 1. Let 
A; be any eigenvalue of A and y; an eigenvector corresponding to \,, and 
M the 1-dimensional subspace spanned by it. Let NV’ be the orthogonal com- 
plement of M. Let Py denote the orthogonal projection on NV. For y € N, 
let Ayry = Py Ay. Then, Ay is a linear operator on the (n — 1)-dimensional 
space N’. So, by the induction hypothesis, there exists an orthogonal ba- 


SIS Y2,---,Yn Of N such that for k = 2,... ,n the vector Any, is a linear 
combination of yo,...,y,. Now Y1,--+,Yn is an orthogonal basis for H, and 
each Ay; is a linear combination of yj,... :¥x fork = 1,2,...,n. Thus, the 


matrix of A with respect to this basis is upper triangular. In other words, 
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every matrix A is unitarily equivalent (or unitarily similar) to an up- 
per triangular matrix T,, i.e., A= QTQ*, where Q is unitary and T is upper 
triangular. This triangular matrix is called a Schur Triangular Form for 
A. An orthonormal basis with respect to which A is upper triangular is 
called a Schur basis for A. If A is normal, then T is diagonal and we have 
@* AQ = D, where D is a diagonal matrix whose diagonal entries are the 
eigenvalues of A. This is the Spectral Theorem for normal matrices. 
The Spectral Theorem makes it easy to define functions of normal matri- 
ces. If f is any complex function, and if D is a diagonal matrix with \4,..., 
An on its diagonal, then f(D) is the diagonal matrix with f(A1),...,f(An) 
on its diagonal. If A = QDQ*, then f(A) = Qf(D)Q*. A special conse- 
quence, used very often, is the fact that every positive operator A has a 
unique positive square root. This square root will be written as A!/2. 


Exercise 1.2.2 Show that the following statements are equivalent: 
(1) A is positive. 
(11) A = B*B for some B. 
(111) A=T*T for some upper triangular T. 


(iv) A = T*T for some upper triangular T with nonnegative diagonal 
entries. | 


If A is positive definite, then the factorisation in (iv) is unique. This is 
called the Cholesky Decomposition of A. 


Exercise [.2.3 (i) Let {A,} be a family of mutually commuting operators. 
Then, there is a common Schur basis for {Ag}. In other words, there exists 
a unitary Q such that Q* AgQ is upper triangular for all a. 

(ii) Let {Ag} be a family of mutually commuting normal operators. Then, 
there exists a unitary Q such that Q* AgQ is diagonal for all a. 


For any operator A the operator A*A is always positive, and its unique 
positive square root is denoted by |A|. The eigenvalues of |A| counted with 
multiplicities are called the singular values of A. We will always enu- 
merate these in decreasing order, and use for them the notation s,(A) > 
89(A) > +--+ > 8n(A). 

If rank A = k, then s,(A) > 0, but s,4:(A) =--- = 5,(A) =0. Let S be 
the diagonal matrix with diagonal entries s;(A),...,S,(A) and S; thekxk 
diagonal matrix with diagonal entries s;(A),...,8,%(A). Let Q = (Qi, Q2) 
be the unitary matrix in which Q, is the n x k matrix whose columns are 
the eigenvectors of A*A corresponding to the eigenvalues s?(A),...,82(A) 
and Q2 the n x (n— k) matrix whose columns are the eigenvectors of A* A 
corresponding to the remaining eigenvalues. Then, by the Spectral ‘Theorem 


2 0 
aaaa=( 4 )- 
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Note that 
Qi(A"A)Q1 = SZ, Q3(A*A)Q2 = 0. 
The second of these relations implies that AQ2 = 0. From the first one we 


can conclude that if W; = AQ, $7", then WW, = I,. Choose W2 so that 
W = (W,, W2) is unitary. Then, we have 


ean { WrAQi: WFAQ2 \_ [ Sze O 
wrAQ=( Wie’ WzAQ,)~\ 0 0) 


This is the Singular Value Decomposition: for every matrix A there 
exist unitaries W and Q such that 


W*AQ = S, 


where S is the diagonal matrix whose diagonal entries are the singular 
values of A. 

Note that in the above representation the columns of Q are eigenvectors 
of A*A and the columns of W are eigenvectors of AA* corresponding to 
the eigenvalues s*(A), 1 <j <n. These eigenvectors are called the right 
and left singular vectors of A, respectively. 


Exercise 1.2.4 (i) The Singular Value Decomposition leads to the Polar 
Decomposition: Every operator A can be written as A = U P, where U 
ws unitary and P is positive. In this decomposition the positive part P is 
unique, P=|A|. The unitary part U is unique if A is invertible. 

(tt) An operator A is normal if and only if the factors U and P in the 
polar decomposition of A commute. 

(ttt) We have derived the Polar Decomposition from the Singular Value 


Decomposition. Show that it is possible to derive the latter from the former. 


Every operator A can be decomposed as a sum 


A=ReA+iIm4A, 


where ReA = Ata’ and ImA = AeA This is called the Cartesian 


Decomposition of A into its “real” and “imaginary” parts. The operators 
Re A and Im A are both Hermitian. 


The norm of an operator A is defined as 


|All = sup Ax}, 


We also have 
|All|= sup |(y, Az)}. 
Iz ]=|ly||=1 
When A is Hermitian we have 


|All = sup |(z, Az). 


IIe l]=1 
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For every operator A we have 


|All = s1(A) = ||A* Al]? 


When A is normal we have 
|| Al] = max{|A;| : A; is an eigenvalue of A}. 


An operator A is said to be a contraction if ||Aj| < 1. We also use 


the adjective contractive for such an operator. A positive operator A is 
contractive if and only if A < I. 


To distinguish it from other norms that we consider later, the norm || A|| 
will be called the operator norm or the bound norm of A. 
Another useful norm is the norm 


|All2 = (bo 83(A))"? = (tra* A)”, 
j=l 


where tr stands for the trace of an operator. If a;; are the entries of a 
matrix representation of A relative to an orthonormal basis of H, then 


All2 = (7 laas?)*”?. 
i,j 


This makes this norm useful in calculations with matrices. This is called 


the Frobenius norm or the Schatten 2-norm or the Hilbert-Schmidt 
norm. 


Both ||A|j and ||Al|/2 have an important invariance property called uni- 
tary invariance: we have ||A|| = ||UAV|| and ||/Alj2 = ||UAV]lo for all 
unitary U,V. 


Any two norms on a finite-dimensional space are equivalent. For the 
norms || Al] and ||A|l2 it follows from the properties listed above that 


|All < Alle < n*/?|| Al 
for every A. 


Exercise I.2.5 Show that matrices with distinct eigenvalues are dense in 
the space of all n x n matrices. (Use the Schur Triangularisation. ) 


Exercise I.2.6 Jf ||A|| <1, then I — A ts invertible and 
(I-A) 7} =I+A+A?4+>->-, 
a convergent power series. This is called the Neumann Series. 


Exercise I.2.7 The set of all invertible matrices is a dense open subset of 
the set of alln x n matrices. The set of all unitary matrices 1s a compact 
subset of the set of alln xn matrices. These two sets are also groups under 
multiplication. They are called the general linear group GL(n) and the 
unitary group U(n), respectively. 
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Exercise 1.2.8 For any matriz A the series 


A2 A” 
expA =I+A+—+4+---+ 


21 mt 


converges. This is called the exponential of A. The matriz exp A is always 
invertible and 


(exp A)~* = exp(—A). 


Conversely, every invertible matrix can be expressed as the exponential of 
some matrix. Every unitary matrix can be expressed as the exponential of 
a skew-Hermitian matriz. 


The numerical range or the field of values of an operator A is the 
subset W(A) of the complex plane defined as 


W(A) = {(a, Az) : ||2|| = 1}. 


Note that 
W(UAU*) = W(A) for all U € U(n), 
W(aA+blI) = aW(A)+bW(I) forall a,beC. 


It is clear that if X is an eigenvalue of A, then » is in W (A). It is also clear 
that W(A) is a closed set. An important property of W(A) is that it is a 
convex set. This is called the Toeplitz-Hausdorff Theorem; an outline 
of its proof is given in Problem I.6.2. 


Exercise 1.2.9 (i) When A is normal, the set W(A) is the conver hull 
of the eigenvalues of A. For nonnormal matrices, W(A) may be bigger 
than the convex hull of its eigenvalues. For Hermitian operators, the first 
statement says that W(A) is the closed interval whose endpoints are the 
smallest and the largest eigenvalues of A. 


(11) If a unit vector x belongs to the linear span of the eigenspaces cor- 
responding to eigenvalues \1,...,”% of a normal operator A, then (x, Ax) 


lies in the conver hull of d4,..., Az. (This fact will be used frequently in 
Chapter ITI. ) 


The number w(A) defined as 


w(A) = sup |(z, Az)| 


|r l]=1 
is called the numerical radius of A. 
Exercise 1.2.10 (i) The numerical radius defines a norm on L(H). 
(tt) w(UAU*) = w(A) for all U € U(n). 
(ttt) w(A) < ||Al] < 2w(A) for all A. 
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(iv) w(A) = ||Al| if (but not only if) A is normal. 


The spectral radius of an operator A is defined as 
spr(A) = max{|A| : is an eigenvalue of A}. 


We have noted that spr(A) < w(A) < ||A||, and that the three are equal if 
(but not only if) the operator A is normal. 


1.3 Direct Sums 


If U, V are vector spaces, their direct sum is the space of columns (“) with 
ué€U and ve V. This is a vector space with vector operations naturally 
defined coordinatewise. If H,K are Hilbert spaces, their direct sum is a 
Hilbert space with inner product defined as 


(i) (9) = (ha hire + (hs Ri) 


We will always denote this direct sum as H OK. 

If M and WN are orthogonally complementary subspaces of H, then the 
fact that every vector x in H has a unique representation x = u+v with u € 
M and v € N implies that H is isomorphic to M @ N. This isomorphism 
is given by a natural, fixed map. So, we say that H = M @WN. When a 
distinction is necessary we call this an internal direct sum. If M,N are 
subspaces of 7{ complementary in the algebraic but not in the orthogonal 
sense; i.e., if M and WN are disjoint and their linear span is H, then every 
vector x in 7{ has a unique decomposition x = u+v as before, but not 
with orthogonal summands. In this case we write H = M+WN and say H 
is the algebraic direct sum of M and NV. 

If H = M@QN is an internal direct sum, we may define the injection 
of M into H as the operator Ip, € L(M,H) such that I,,(u) = u for all 
u € M. Then, Ii, is an element of C(H,M) defined as [,2 = Pz for all 
x € H, where P is the orthoprojector onto M. Here one should note that 
Ix, is not the same as P because they map into different spaces. ‘That is 
why their adjoints can be different. Similarly define Iyy. Then, (I, Jy) is 
an isometry from the ordinary (“external”) direct sum M ®N onto H. 

IfH = MON and A € L(H), then using this isomorphism, we can write 


A as a block-matrix 
A= BC 
 \ DE ?? 
where B € L(M),C € L(N,M), etc. Here, for example, C = I), Al. 


The usual rules of matrix operations hold for block matrices. Adjoints are 
obtained by taking “conjugate transposes” formally. 
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If the subspace M is invariant under A; i.e., Ax € M whenever x € M, 
then in the above block-matrix representation of A we must have D = 0. 
Indeed, this condition is equivalent to M being invariant. If both M and its 
orthogonal complement NV are invariant under A, we say that M reduces 
A. In this case, both C and D are 0. We then say that the operator A is 
the direct sum of B and FE and write A= BOE. 


Exercise 1.3.1 Let A= A; @ Ag. Show that 


(1) W(A) is the conver hull of W(A1) and W(Ag); i.e., the smallest convex 
set containing W(A,) UW(Ag). 


(ti) |[Al|) == max(||Aill, |[Aall), 
spr(A) = max(spr(Ai), spr(A2)), 
w(A) = max(w(A),w(Ao)). 


Direct sums in which each summand H,; is the same space H arise often in 
practice. Very often, some properties of an operator A on H are reflected in 


those of some other operators on H@H. This is illustrated in the following 
propositions. 


Lemma 1.3.2 Let A € L(H). Then, the operators (4 ‘) and (74 ) are 
unitarily equivalent in L(H ® H). 


Proof. ; The equivalence is implemented by the unitary operator 
1 
V2 ( —-I I ) ’ = 


Corollary 1.3.3 An operator A on H is positive uf and only if the operator 
(4 4) on H @®H is positive. 


This can also be seen by writing (4 4) = (AV, ) (45° ae), and 
using Exercise [.2.2. 


Corollary 1.3.4 For every A € L(H) the operator (\4! A) 18 positive. 


Proof. Let A=UP be the polar decomposition of A. Then, 


|A|  A* _ P PU* 
A |A*| 7 UP UPU* 
_ I O P P I O 
7 O U P P O U* }- 
Note that ( O ) is @ unitary operator on H@ H. a 


Proposition 1.3.5 An operator A on H is contractive af and only if the 
operator (4, 4 ) on HOH is positive. 
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Proof. If A has the singular value decomposition A = USV*, then 


(aT )=(oe)(s7)Co &): 


Hence (4 4’) is positive if and only if (¢ 3) is positive. Also, ||Al] = ||S|. 
So we may assume, without loss of generality, that A = S. 

Now let W be the unitary operator on H@H that sends the orthonormal 
basis {e€1,€2,...,€2n} to the basis {€1, €n41;€2;€n4+2,--+,€n,€an}. Then, 
the unitary conjugation by W transforms the matrix (4 7) to a direct 
sum of n two-by-two matrices 


1 sy 1 Ss» 1 Sy 
(a T)eCa Teve(. 7) 
This is positive if and only if each of the summands is positive, which 
happens if and only if s; < 1 for all 7; i.e., S is a contraction. a 


Exercise 1.3.6 If A is a contraction, show that 


A*(I — AA*)'/? = (I — A* A)? A*. 


Use this to show that if A is a contraction on H, then the operators 


— (, A rage 


_ A* A)}/? _ A* 
y= A —(I — AA*)*/2 


are unitary operators on H@®H. 


Exercise I.3.7 For every matrix A, the matrix (4 *) 1s invertible and its 


Inverse 18 (4 A). Use this to show that if A, B are any two nxn matrices, 


then , 
I A\. AB QO IT A\ | 0 0 
0 J. B O 0 J )/ \ B BA )° 


This implies that AB and BA have the same eigenvalues. (This last fact 
can be proved in another way as follows. If B is invertible, then AB = 
B~-}(BA)B. So, AB and BA have the same eigenvalues. Since invertible 
matrices are dense in the space of all matrices, and a general known fact 
in complex analysis is that the roots of a polynomial vary continuously with 
the coefficients, the above conclusion also holds in general.) 


Direct sums with more than two summands are defined in the same way. 
We will denote the direct sum of spaces 7{1,..., Hx as @F_1H;, or simply 
as ®;71;. 
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I.4 Tensor Products 


Let V;,1 <7 < k, be vector spaces. A map F' from the Cartesian product 
V, x---x V, to another vector space W is called multilinear if it depends 
linearly on each of the arguments. When W = C, such maps are called 
multilinear functionals. When k = 2, the word multilinear is replaced 
by bilinear. Bilinear maps, thus, are maps F': V; x V2 — W that satisfy 
the conditions 


F(u,av;+bve) = aF (u,v) + bF (u, v2), 
F(au; + bug,v) = aF (u,v) +bF(u2,v), 


for alla,be C; u,ui, ue € V; and v, v1, v2 € V2. We will be looking most 
often at the special situation when each V; is the same vector space. 


As a special example consider a Hilbert space H and fix two vectors x, y 
in it. Then, 


F(u,v) = (x, u){y, v) 
is a bilinear functional on H. 

We see from this example that it is equally natural to consider conjugate- 
multilinear functionals as well. Even more generally we could study func- 
tions that are linear in some variables and conjugate-linear in others. As an 
example, let A € C(H,K) and for u € K and v EH, let F(u,v) = (u, Av) x. 
Then, F’ depends linearly on v and conjugate-linearly on u. Such function- 
als are called sesquilinear; an inner product is a functional of this sort. 
The example given above is the “most general” example of a sesquilinear 
functional: if F'(u,v) is any sesquilinear functional on K x H, then there 
exists a unique operator A € £(H,K) such that F(u,v) = (u, Av). 

In this sense our first example is not the most general example of a 
bilinear functional. Bilinear functionals F(u, v) on H that can be expressed 


as F'(u,v) = (x,u)(y,v) for some fixed z,y € H are called elementary. 
They are special as the following exercise will show. 


Exercise I.4.1 Let x,y,z be linearly independent vectors in H. Find a 
necessary and sufficient condition that a vector w must satisfy in order 
that the bilinear functional 


F(u,v) — (x, u)(y, v) + (z, u)(w, v) 


as elementary. 


The set of all bilinear functionals is a vector space. The result of this 
exercise Shows that the subset consisting of elementary functionals is not 
closed under addition. We will soon see that a convenient basis for this vec- 
tor space can be constructed with elementary functionals as its members. 

The procedure, called the tensor product construction, starts by taking 
formal linear combinations of symbols z @ y with x € H,y € K; then 
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reducing this space modulo suitable equivalence relations; then identifying 
the resulting space with the space of bilinear functionals. 


More precisely, consider all finite sums of the type Scei(ai ® yi), 


i 
c, € C,z; € H,y; € K and manipulate them formally as linear combi- 
nations. In this space the expressions 


a(x@y) — (axr®y) 
alx®y) — («@ay) 
T1@Qy+22@y — (21 +722) @y 
TOy+L@yo — L(y + ye) 


are next defined to be equivalent to 0, for all a € C;z,2,,29 € H and 
Y,y1,y2 © K. The set of all linear combinations of expressions x ® y for 
xe H,y € K, after reduction modulo these equivalences, is called the 
tensor product of 1 and K and is denoted as H@K. 


Each term c(z ® y) determines a conjugate-bilinear functional F*(u, v) 
on H x K by the natural rule 


F*(u,v) = c(u,x)(v, y). 


This can be extended to sums of such terms, and the equivalences were 
chosen in such a way that equivalent expressions (i.e., expressions giving the 
same element of H®K) give the same functional. The complex conjugate of 
each such functional gives a bilinear functional. These ideas can be extended 
directly to k-linear functionals, including those that are linear in some of 
the arguments and conjugate-linear in others. 


Theorem 1.4.2 The space of all bilinear functionals on 'H is linearly spanned 


by the elementary ones. If (e1,...,€n) 18 a fixed orthonormal basis of H, 
then to every bilinear functional F there correspond unique vectors X1,...,Xn 
such that 


Fre = Se; & Ly. 
J 
Every sequence £2;,1< 7 <n, leads to a bilinear functional in this way. 


Proof. Let F be a bilinear functional on H. For each j, F*(e;,v) is a 
conjugate-linear function of v. Hence there exists a unique vector z; such 
that F*(e;,v) = (v,2;) for all v. 

Now, if u = Na;e; is any vector in H, then F(u,v) = La;F(e;,v) = 
Di(e;,u)(x;,v). In other words, F* = Ue; ® x; as asserted. | 


A more symmetric form of the above statement is the following: 


Corollary 1.4.3 If (ei,...,en) and (fi,..., fn) are two fized orthonormal 
bases of H, then every bilinear functional F' on H has a unique represen- 
tation F = Qi; (e; 4) fj)". 
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(Most often, the choice (e1,...,¢n) = (f1,..., fn) is the convenient one for 
using the above representations. ) 

Thus, it is natural to denote the space of conjugate-bilinear functionals 
on H by H@H. This is an n*-dimensional vector space. The inner product 
on this space is defined by putting 


(uy © ug, v1 ® V2) = (U1, U1) (U2, V2), 


and then extending this definition to all of H @ H in a natural way. It 
is easy to verify that this definition is consistent with the equivalences 
used in defining the tensor product. If (e1,...,en) and (fi,...,fn) are 
orthonormal bases in H, then e; ® f;, 1 <%,7 <n, form an orthonormal 
basis in 7H @ H. For the purposes of computation it is useful to order this 
basis lexicographically: we say that e; ® f; precedes ex, ® fy, if and only 
if either 1 << k ori=k andj < £&. 

Tensor products such as H @ K or K* @ H can be defined by imitating 
the above procedure. Here the space K* is the space of all conjugate-linear 
functionals on K. This space is called the dual space of K. There is a natu- 
ral identification between K and K* via a conj ugate-linear, norm preserving 
bijection. 


Exercise 1.4.4 (i) There is a natural isomorphism between the spaces K® 
H* and L(H,K) in which the elementary tensor k @ h* corresponds to the 
linear map that takes a vector u of H to (h, u)k. This linear transformation 
has rank one and all rank one, transformations can be obtained in this way. 

(11) An explicit construction of this wsomorphism ~ is outlined below. Let 
€1,-..,€n be an orthonormal basis for H and for H*. Let fi,---,fm be an 
orthonormal basis for K. Identitfy each element of L(H,K) with its matrix 
with respect to these bases. Let E;; be the matrix all whose entries are 
zero except the (1,7)-entry, which is 1. Show that o( fi ® e;) = Ej; for all 
l<ti<m, 1<j<n. Thus, if A is anym xn matrix with entries Ais, 


then 
yp *(A) = dial fi Qe) = S| (Ae;) ® €;. 


j 

(iti) The space L(H,K) is a Hilbert space with inner product (A, B) = 

tr A*B. The set hij, 1<ti<m,1< 7 <n, 1s an orthonormal basis 

for this space. Show that the map p ts a Hilbert space isomorphism; i.e., 
(p~"(A), y7'(B)) = (A, B) for all A, B. 


Corresponding facts about multilinear functionals and tensor products 
of several spaces are proved in the same way. We will use the notation @*H 
for the k-fold tensor product H@H®---@QH. 

Tensor products of linear operators are defined as follows. We first define 
A ® B on elementary tensors by putting (A @ B)(z ®@ y) = Ax ® By. We 
then extend this definition linearly to all linear combinations of elementary 
tensors, 1.e., to all of 7} ® H. This extension involves no inconsistency. 
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It is obvious that (A @ B)(C ® D) = AC @ BD, that the identity on 
H ® H is given by J @ J, and that if A and B are invertible, then so is 
A®B and (A® B)"! = A! @ B™!. A one-line verification shows that 
(A@ B)* = A* ® B*. It follows that A@ B is Hermitian if (but not only if) 
A and B are Hermitian; A ® B is unitary if (but not only if) A and B are 
unitary; A @ B is normal if (and only if) A and B are normal. (The trivial 
cases A = 0, or B = 0, must be excluded for the last assertion to be valid.) 


Exercise 1.4.5 Suppose it is known that M is an invariant subspace for A. 


What invariant subspaces for A® A can be obtained from this information 
alone? 


For operators A, B on different spaces H and K, the tensor product can 
be defined in the same way as above. This gives an operator A @ B on 


7H1@K. Many of the assertions made earlier for the case H = K remain true 
in this situation. 


Exercise 1.4.6 Let A and B be two matrices (not necessarily of the same 
size). Relative to the lezicographically ordered basis on the space of tensors, 


the matriz for A® B can be written in block form as follows: if A = (ai;), 
then 


ayiB ane ainB 
ASB= . wee wee 
Qni1B --- AnnB 
Especially important are the operators A@ A®---@ A, which are k-fold 
tensor products of an operator A € C(H). Such a product will be written 


more briefly as A®* or @*A. This is an operator on the n*-dimensional 
space @¥H. 


Some of the easily proved and frequently used properties of these prod- 
ucts are summarised below: 


1. (@* A)(@*B) = @*(AB). 
2. (@* A) = @* 4-1! when either inverse exists. 
. (@* A)” = @FA*. 


_ If A is Hermitian, unitary, normal or positive, then so is @* A. 


o - Wd 


. Ifa ,...,@% (not necessarily distinct) are eigenvalues of A with eigen- 
. ° . k 
vectors U1,..., Uk, respectively, then a1 --- a, is an eigenvalue of @" A 
and u; ®-:-@ ux is an eigenvector for it. 


6. If s;,,..., $;, (not necessarily distinct) are singular values of A, then 


S;,°°°8;, is a singular value of @* A. 


1 k 


7. || @* All = |All’. 


The reader should formulate and prove analogous statements for tensor 
products A, @ Ag &--- ®@ Ax of different operators. 
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I.5 Symmetry Classes 


In the space ®*H there are two especially important subspaces (for non- 
trivial cases, k > 1 and n > 1). 

The antisymmetric tensor product of vectors x1,...,2, in H is de- 
fined as 


ty A+++ALE = (hI)? “eae QQ Lop), 


where o runs over all permutations of the k indices and €, is +1, depending 
on whether g is an even or an odd permutation. (€, is called the signature 
of o.) The factor (k!)~1/? is chosen so that if 2; are orthonormal, then 
£1 A+:-AZ» 1s a unit vector. The antisymmetry of this product means that 


TiN NG N NEGA NEE = BN NEA AD A+ ASR, 


i.e., interchanging the position of any two of the factors in the product 
amounts to a change of sign. In particular, x; A--- Az, = 0 if any two of 
the factors are equal. 

The span of all antisymmetric tensors 2; A---A xp in @*H is denoted by 
AFH. This is called the kth antisymmetric tensor product (or tensor 
power) of H. 

Given an orthonormal basis (e,... , €n) in H, there is a standard way of 
constructing an orthonormal basis in A-H. Let Qx,n denote the set of all 
strictly increasing k-tuples chosen from {1,2,... ,n}; ie, Z © Qen if and 
only if Z = (21, 22,...,7,), where 1 <i, <ig <--- < th <n. For such an ZI 
let er = &;, A---Ae;,. Then, {er : Ie Qin} gives an orthonormal basis 
of A¥H. Such Z are sometimes called multi-indices . It is conventional to 
order them lexicographically. Note that the cardinality of Qin, and hence 
the dimensionality of A*H, is (7). 

If in particular k = n, the space A*H is 1-dimensional. This plays a 
special role later on. When k > n the space A¥H is {0}. 


Exercise I.5.1 Show that the inner product (Li N-+-ALK, Yi A++-A yg) is 
equal to the determinant of the k x k matrix ((23,y;)) 


. 


The symmetric tensor product of L1,-..,Lk is defined as 


GiV-:-V Lk = (RI)? S ay Q---@ Loe(k)> 


where o, as before, runs over all permutations of the k indices. The linear 
span of all these vectors comprises the subspace V*H of @*H. This is called 
the kth symmetric tensor power of H. 

Let Gz, denote the set of all non-decreasing k-tuples chosen from 
{1,2,...,n};ie, Ze Gin if and only if Z = (i,,...,7,), where 1 < i, < 
tg-++ < te <n. If such an TZ consists of @ distinct indices 11,-.-,%¢ with 
multiplicities m,,..., mg, respectively, put m(Z) = m,!mo!---my!. Given 
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an orthonormal basis (€1,...,€n) of H define, for every I € Gun, er = 
e;, Vei, V---Veq,. Then, the set {m(Z)~1/2ez : I € Gxn} is an orthonormal 
basis in VFH. Again, it is conventional to order these multi-indices lexico- 
graphically. The cardinality of the set Gy, and hence the dimensionality 
of the space VFH, is ("t%—'). 

Notice that the expressions for the basis in A*H are simpler because 
m(Z) = 1 for T € Qin. 


Exercise 1.5.2 The elementary tensors x @---@2x, with all factors equal, 
are all in the subspace V"H. Do they span it? 


Exercise 1.5.3 Let M be a p-dimensional subspace of H and N its or- 
thogonal complement. Choosing j vectors from M and k — 4 vectors from 
N and forming the linear span of the antisymmetric tensor products of all 
such vectors, we get different subspaces of A*H; for example, one of those 
is AX M. Determine all the subspaces thus obtained and their dimensional- 
ities. Do the same for V*H. 


Exercise 1.5.4 [f dimH = 3, then dim@°H = 27, dimA®°H = 1 and 
dim V7H = 10. In terms of an orthonormal basis of H, write an element of 
(APH @ V3H)+. 


The permanent of a matrix A = (a;;) is defined as 
per A= S -aia(1) ++ Ane(n): 


where o varies over all permutations on n symbols. Note that, in contrast 
to the determinant, the permanent is not invariant under similarities. Thus, 
matrices of the same operator relative to different bases may have different 
permanents. 


Exercise 1.5.5 Show that the inner product (x1 V---V 2, Y1 V--- V Yk) 18 
equal to the permanent of the k x k matrix ((24, y;)). 


The spaces AFH and V*H are also referred to as “symmetry classes” of 
tensors — there are other such classes in @*H. Another way to look at them 
is as the ranges of the respective symmetry operators. Define P, and Py as 
linear operators on @*H by first defining them on the elementary tensors 
as 

Pa(x1 @---@ eR) = (KI) Pa A+ Axe 


Py (a1 @Q-+-@ Xz) = (kl) ~/2 2, Vere>V O&K 


and extending them by linearity to the whole space. (Again it should be 
verified that this can be done consistently.) The constant factor in the above 
definitions has been chosen so that both these operators are idempotent. 
They are also Hermitian. The ranges of these orthoprojectors are AFH and 
VFH, respectively. 
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If A € L(H), then Ar; A--- A Az, lies in AFH for all 21,...,2% in H. 
Using this, one sees that the space A*H is invariant under the operator 
@* A. The restriction of @*A to this invariant subspace is denoted by A*.A 
or A. This is called the kth antisymmetric tensor power or the kth 
Grassmann power of A. We could have also defined it by first defining 
it on the elementary antisymmetric tensors 7; A---A Zz as 


AFA(a1 A+++ Aap) = Ati A---A\ Ary 


and then extending it linearly to the span A*H of these tensors. 


Exercise 1.5.6 Let A be a nilpotent operator. Show how to obtain, from a 
Jordan basis for A, a Jordan basis for A7A. 


The space V*H is also invariant under the operator @* A. The restriction 
of @*A to this invariant subspace is written as VA or AY* and called the 
kth symmetric tensor power of A. 


Some essential and simple properties of these operators are summarised 
below: 


1. (ASA)(AFB) = A®(AB), (V¥A)(VFB) = V*(AB). 
2. (AF A)* = AFA*, (VEA)* = VEAP. 
3. (AFA)~“1 = AFA-!, (VEA)“1 = VFAT, 


4. If A is Hermitian, unitary, normal or positive, then so are A*A and 


VFA. 
d. Ifay,...,a% are eigenvalues of A (not necessarily distinct) belonging 
to eigenvectors u1,..., Uz, respectively, then a; ---a, is an eigenvalue 


of VA belonging to elgenvector u; V---V ug; if in addition the vectors 
u,; are linearly independent, then a,---a, is an eigenvalue of A* A 
belonging to eigenvector wu, A--- A ug. 


6. If s1,...,8, are the singular values of A, then the singular values 
of AXA are Si, **+ 8;,, where (%1,...,i7,) vary over Qn} the singular 
values of V-A are Si, °*+Si,, where (2;,...,7,), vary over Gkon- 


7. trA*A is the kth elementary symmetric polynomial in the eigenval- 


ues of A; trV*A is the kth complete symmetric polynomial in the 
eigenvalues of A. 


(These polynomials are defined as follows. Given any n-tuple (a1,..., a7) 
of numbers or other commuting objects, the kth elementary symmetric 
polynomial in them is the sum of all terms Qi, Min ++ 4, for (41, 72,..., t%) 
in Qk,n; the kth complete symmetric polynomial is the sum of all terms 
Qi, Ain +++ 4, for (41, 42,...,%%) in Gin-) 

For A € L£(H), consider the operator A@I@---@I1+I@AQI---@QI 
+::-+1@I@---@A. (There are k summands, each of which is a product 
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of k factors.) The eigenvalues of this operator on @*H are sums of eigenval- 
ues of A. Both the spaces A*H and V"H are invariant under this operator. 
One pleasant way to see this is to regard this operator as the t-derivative 
at t = 0 of @*(I + tA). The restriction of this operator to the space A*H 
will be of particular interest to us; we will write this restriction as Al*l. If 
U1,-..,Ux are linearly independent eigenvectors of A belonging to eigen- 
values a1,...,Q@%, then uj A--+A ug is an eigenvector of A|*! belonging to 
eigenvalue a, +--+: +a . 

Now, fixing an orthonormal basis (e},... ,€n) of H, identify A with its 
matrix (a;;). We want to find the matrix representations of A¥A and VFA 
relative to the standard bases constructed earlier. 

The basis of AFH we are using is ez,Z € Qk.n. The (Z, 7)-entry of AFA 
is (ez, (A*A)ez7). One may verify that this is equal to a subdeterminant 
of A. Namely, let A[Z|7] denote the k x k matrix obtained from A by 
expunging all its entries a;; except those for which i € Z and 7€ J. Then, 
the (Z, 7)-entry of A*A is equal to det A[Z|7]. 

The special case k = n leads to the 1-dimensional space A” 7{. The oper- 


ator A” A on this space is just the operator of multiplication by the number 
det A. We can thus think of det A as being equal to A”A. 

The basis of V“H we are using is m(Z)~}/2er , I € Gkn- The (Z, 7)- 
entry of the matrix v*A can be computed as before, and the result is 
somewhat similar to that for A*A. For Z = (%1,...,%%) and J = (ji,... Jk) 
in Gxn, let AlZ|7| now denote the k x k matrix whose (r, s)-entry is the 
(t;,js) - entry of A. Since repetitions of indices are allowed in Z and 7 ; 
this is not a submatrix of A this time. One verifies that the (Z, 7)-entry of 
V*A is (m(Z)m(7))~1/? per A[Z|7]. 

In particular, per A is one of the diagonal entries of V”.A: the (Z, Z)-entry 
for T = (1,2,...,n). 


Exercise 1.5.7 Prove that for any vectors uj,...,UK,U1,.-.,UzE we have 


|det((ui,vj))|? <<  det((ui,u;)) det((v;,v;)), 
Iper((ui,vj))|? <<  per((us, u;))per((vi, v;)). 


Exercise 1.5.8 Prove that for any two matrices A,B we have 
|per(AB)|? < per(AA*)per(B* B). 
(The corresponding relation for determinants is an easy equality.) 
Exercise I.5.9 (Schur’s Theorem) If A is positive, then 
per A > det A. 


[Hint: Using Exercise I.2.2 write A= T*T for an upper triangular T. Then 
use the preceding exercise cleverly./ 
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We have observed earlier that for any vectors 11,...,Z% in 7H we have 
det((a;,2;)) = lar A-++ A zell?. 


When H = R”, this determinant is also the square of the k-dimensional 
volume of the parallelepiped having 71,..., 2, as its sides. To see this, note 
that neither the determinant nor the volume in question is altered if we add 
to any of these vectors a linear combination of the others. Performing such 
operations successively, we can reach an orthogonal set of vectors, some of 
which might be zero. In this case it is obvious that the determinant is equal 
to the square of the volume; hence that was true initially too. 

Given any k-tuple X = (21,...,2,), the matrix ((zi,2;)) = X*X is 
called the Gram matrix of the vectors z;; its determinant is called their 
Gram determinant. 


Exercise 1.5.10 Every k x k positive matrix A = (a;;) can be realised as a 
Gram matriz, 1.e., vectors x;,1 <7 <k, can be found so that a;; = (xi, 2;) 
for all 1,7. 


1.6 Problems 


Problem 1.6.1. Given a basis U = (uj,..., un), not necessarily orthonor- 
mal, in 71, how would you compute the biorthogonal basis (v),..., Un)? 
Find a formula that expresses (v;,r) for each x € H and j = 1,2,...,k in 


terms of Gram matrices. 


Problem 1.6.2. A proof of the Toeplitz-Hausdorff Theorem is outlined 
below. Fill in the details. 

Note that W(A) = {(z, Az) : ||z|| = 1} = {tr Arva* : z*x = 1}. It is 
enough to consider the special case dim H = 2. In higher dimensions, this 
special case can be used to show that if x,y are any two vectors, then any 
point on the line segment joining (x, Ax) and (y, Ay) can be represented as 
(z, Az), where z is a vector in the linear span of x and y. Now, on the space 
of 2 x 2 Hermitian matrices consider the linear map ®(T) = tr AT. This is 
a real linear map from a space of 4 real dimensions (the 2 x 2 Hermitian 
matrices) to a space of 2 real dimensions (the complex plane). We want 
to prove that ® maps the set of 1-dimensional orthoprojectors xz* onto a 
convex set. The set of these projectors in matrix form is 


cost way lil cos 2t e’’ sin 2t 
( e~ sint ) (cost en Sint) = 9 + 2\ e ™sin2t —cos2t 


This is a 2-sphere centred at (2 1 ) and having radius 1/./2 in the Frobe- 
2 


nius norm. The image of a 2-sphere under a linear map with range in R? 
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must be either an ellipse with interior, or a line segment, or a point; in any 
case, a convex set. 


Problem I.6.3. By the remarks in Section 5, vectors L1,--.,2_ are lin- 
early dependent if and only if 2; A---A a, =0. This relationship between 
linear dependence and the antisymmetric tensor product goes further. Two 
sets {Z1,...,2x} and {y1,..., yx} of linearly independent vectors have the 
same linear span if and only if 2, A---Az, = cy, A-- ‘yx for some constant 
c. ‘Thus, there is a one-to-one correspondence between k-dimensional sub- 
spaces of a vector space W and 1-dimensional subspaces of A* W generated 
by elementary tensors 1; A--- A 2p. 


Problem 1.6.4. How large must dim W be in order that there exist some 
element of A2W which is not elementary? 


Problem I.6.5. Every vector w of W induces a linear operator 7), from 
AFW to A*t+!W as follows. Iw is defined on elementary tensors as 
Tw (v1 A+++ A vp) = 01 A+++ A vp Aw, and then extended linearly to all 
of A*W. It is, then, natural to write Ty(x) = x Aw for any x € AW. 
Show that a nonzero vector z in A*W is elementary if and only if the space 
{we W:x2Aw =O} is k-dimensional. 

(When W is a Hilbert space, the operators T., are called creation oper- 
ators and their adjoints are called annihilation operators in the physics 
literature. ) 


Problem 1.6.6. (The n-dimensional Pythagorean Theorem) Let 
XL1,---,Zpn be orthogonal vectors in R”. Consider the n-dimensional sim- 
plex S with vertices 0,21,...,%,. Think of the (n — 1)-dimensional sim- 
plex with vertices r1,...,2, as the “hypotenuse” of S and the remaining 
(n — 1)-dimensional faces of S as its “legs”. By the remarks in Section 5, 
the k-dimensional volume of the simplex formed by any k points y,..., yx 
together with the origin is (k!)~"||y; A--- A yg||. The volume of a simplex 
not having 0 as a vertex can be found by translating it. Use this to prove 
that the square of the volume of the hypotenuse of S is the sum of the 
squares of the volumes of the n legs. 


Problem I.6.7. (i) Let Q, be the inclusion map from A*H into @*H 
(so that Q% equals the projection P, defined earlier) and let Qy be the 
inclusion map from V*H into @*H. Then, for any A € L(H) 


AFA = Py(@*A)Qy, 


VFA = P,/(@* A)Qy. 


(ii) || A* Al] < ||All®, |] V* All < | AI*- 
(iii) |detA] < |[Al”,  |perA] < || A]|”. 
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Problem I.6.8. For an invertible operator A obtain a relationship between 
A+, A™A, and A”—~!A. 


Problem I.6.9. (i) Let {e1,...,en} and {fi,..., fn} be two orthonormal 
bases in H. Show that 


(ea A+++ AN €n; fa\--*A fn) |? = |e1, fr) |?. 


(ii) Let P and Q be orthogonal projections in H, each of rank n — 1. Let 
x,y be unit vectors such that Px = Qy = 0. Show that 


A™™~"(PQP) = (x,y) |? AX P. 
Problem I.6.10. If the characteristic polynomial of A is written as 
t+ at” +--+ +an, 


then the coefficient a; is the sum of all k x k principal minors of A. This 
is equal to tr A* A. 


Problem I.6.11. (i) For any A, B € L(H) we have 


k 
@*A-@*B=S°C; , 
j=l 
where _ 
C; = (®*-9 A) @ (A — B) @ (@7-1B). 
Hence, 
|| @° A— @*Bl| < kM* 4 - BI, 


where M = max(||All, || B]|). 


(ii) The norms of A*A—A*B and Vv‘ A—v*B are therefore also bounded 
by KM*-1||A — BI. 
(iii) For n x n matrices A, B, 


|detA — detB] <nM"~1||A— BI, 
IperA — perB| <nM”~'||A— BI. 
(iv) The example A = aI,B = (a + €)I for small € shows that these 


inequalities are sometimes sharp. When ||A|| and ||B]| are far apart, find a 
simple improvement on them. 


(v) If A, B are n x n matrices with characteristic polynomials 
+ayt™ 1 +---+a,, 


t” + bt”? +---+6,, 
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respectively, then 
joe ~ ba] < b(p ) MEA ~ Bi 
where M = max(|| Al, || Bll). 


Problem I.6.12. Let A, B be positive operators with A > Bie, A-B 
is positive). Show that 


@*A > @*B, 
AFA > AB, 
VFA > vB, 
detA > detB, 
perA >  perB. 


Problem 1.6.13. The Schur product or the Hadamard product of 
two matrices A and B is defined to be the matrix Ao B whose (2, j )-entry 


is a;;b;;. Show that this is a principal submatrix of A® B, and derive from 
this fact two significant properties: 


(i) Ao Bll < |All | B|| for all A, B. 


(ii) If A,B are positive, then so is Ao B. (This is called Schur’s 
Theorem. ) 


Problem 1.6.14. (i) Let A = (a;;) be an n x n positive matrix. Let 


m= S- ais, l<icn, 
j=l 
Ss = S- Qi7- 
2,9 
Show that n 
s"perA > n! I] \r;|? 
1=1 
[Hint: Represent A as the Gram matrix of some vectors 21,...,Zn, as 


in Exercise 1.5.10. Let u = s~1/2(x; +++: +2). Consider the vectors 
uVuV-::-Vuand 21 V---V Zn, and use the Cauchy-Schwarz inequality.| 
(ii) Show that equality holds in the above inequality if and only if either 
A has rank 1 or A has a row of zeroes. 
(iii) If in addition all a;; are nonnegative and all r; = 1 (so that the 
matrix A is doubly stochastic as well as positive semidefinite), then 
n! 


perA > — 
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Here equality holds if and only if a;; = 2 for all 2,7. 


Problem I.6.15. Let A be Hermitian with eigenvalues a, > ag >:-:: = 
a,,. In Exercise 1.2.7 we noted that 


ay = max{(z, Az) : ||x|| = 1}, 


An, = min{(r, Az) : ||x|| = 1}. 


Using these relations and tensor products, we can deduce some other ex- 
tremal representations: 
(i) For every k = 1,2,...,n, 


k k 
) Q; = max ) (xj, Ax;), 
j=1 j=l 


nr 


k 

S> a;= min ) (x;, Aaj), 
j=un—k+1 j=l 

where the maximum and the minimum are taken over all choices of or- 
thonormal k-tuples (z1,...,2,) in H. The first statement is referred to as 
Ky Fan’s Maximum Principle. It will reappear in Chapter II (with a 
different proof) and subsequently. 

(ii) If A is positive, then for every k = 1,2,...,n 


? 


nr 


k 
I] Qj = min ] | (;, 423), 


where the minimum is taken over all choices of orthonormal k-tuples 
(11,..-,2%) in H. 

|Hint: You may need to use the Hadamard Determinant Theorem, which 
says that the determinant of a positive matrix is bounded above by the 
product of its diagonal entries. This is also proved in Chapter IT. | 

(ii) If A is positive, then for every Z € Qk.n 


TL 


k 
I] a; <det A[Z|Z] < [| 2. 
j=n—k+1 j=l 


Problem 1.6.16. Let A be any n x n matrix with eigenvalues a1,..., Qn. 
Show that / 
trA | n-1 ltr Al? \ J 7/7 
ay ~ SE) < |" (aig- 22) 
n n 
for all 7 = 1,2,...,n. (Results such as this are interesting because they 


give some information about the location of the eigenvalues of a matrix in 
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terms of more easily computable functions like the Frobenius norm || All. 
and the trace. We will see several such statements later.) 


[Hint: First prove that if z = (2j,... ,2n) is a vector with 21 +---+2, = 
0, then 
1/2 
n—l 
maxlas|<("—*) al J 


Problem 1.6.17. (i) Let 21, zo, z3 be three points on the unit circle. Then, 
the numerical range of an operator A is contained in the triangle with 
vertices 21, 22, 23 if and only if A can be expressed as A = 2, A, + Zo9Ao+ 
z3A3, where Aj, Ao, A3 are positive operators with A, + Ao + A3 = J. 

[Hint: It is easy to see that if A is a sum of this form, then W(A) is 
contained in the given triangle. The converse needs some work to prove. 
Let z be any point in the given triangle. Then, one can find O11, 2,03 
such that a; > 0,a,; +a2 +a3 = 1 and z = a1 2; + aoz + 4323. These 
are the “barycentric coordinates” of z and can be obtained as follows. Let 
y = Im(21 22 + 2223 + 2321). Then, for 7 = 1, 2,3, 


(2 — 2541)(Zj42 — 241) 
b] 
7 


where the subscript indices are counted modulo 3. Put 


a; = Im 


A. =Im (A — 25410) (Zj42 — 241) 
j= Im See aa 
Y 


Then, A; have the required properties.| 
(ii) Let W(A) be contained in a triangle with vertices zj, z2, z3 lying on 
the unit circle. Then, choosing A,, Az, A3 as above, write 


3 3 
[ A* \ | A; 2;A; \ _ . 1 2; 
(a 7 JDC oa, “AP )-eae( 2 7) 
jQ= 


j=l 


This, being a sum of three positive matrices, is positive. Hence, by Propo- 
sition 1.3.5 A is a contraction. 

(iii) If W(A) is contained in a triangle with vertices z), 22, z3, then || A|| < 
max |z;|. This is Mirman’s Theorem. 


Problem I.6.18. If an operator 7’ has the Cartesian decomposition 7’ = 
A+7B with A and B positive, then 


ITIP < |AIP + BI. 


Show that, if A or B is not positive then this need not be true. 

[Hint: To prove the above inequality note that W(Z') is contained in a 
rectangle in the first quadrant. Find a suitable triangle that contains it and 
use Mirman’s Theorem.| 
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II 


Majorisation and Doubly 
Stochastic Matrices 


Comparison of two vector quantities often leads to interesting inequali- 
ties that can be expressed succinctly as “majorisation” relations. There is 
an intimate relation between majorisation and doubly stochastic matrices. 
These topics are studied in detail here. We place special emphasis on ma- 
jorisation relations between the eigenvalue n-tuples of two matrices. This 
will be a recurrent theme in the book. 


II.1 Basic Notions 


Let x = (%1,...,2n) be an element of R”. Let x! and 2! be the vectors 
obtained by rearranging the coordinates of x in the decreasing and the 


increasing orders, respectively. Thus, if x! = (xt, ...,24),, then xt >. > 


z}. Similarly, if ct = (x!, ...,@)),, then x! <---<a!. Note that 


Tl 
ad Or l<j<n. (II.1) 


Let z,y € R”. We say that x is majorised by y, in symbols x ~ y, if 


k k 
» 3; < wit L<k<n, (II.2) 


and 


3 


xe = » 5 (11.3) 
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Example: If z; > 0 and =z; = 1, then 


1 1 
(Ty +-+5 7) < (€1,-+-4an) < (1,0,..-,0). 


The notion of majorisation occurs naturally in various contexts. For ex- 
ample, in physics, the relation x ~< y is interpreted to mean that the vector 
z describes a “more chaotic” state than y. (Think of x; as the probability 
of a system being found in state i.) Another example occurs in economics. 
If r1,...,2n and y1,...,Yn denote incomes of individuals 1,2,... ,n, then 
x ~< y would mean that there is a more equal distribution of incomes in the 
state x than in y. The above example illustrates this. 

From (II.1) we have 


k n n—k 
7= dot; - Doe; 
> _ v5 wi 

j=1 j=l j=1 


k 
Sial > Sy}, 1<k<n (11.4) 
and 


zh= Soy. (II.5) 


Let e denote the vector (1,1,...,1), and for any subset I of {1,2,...,n} 
let ey denote the vector whose 7th component is 1 if 7 € J and 0 if 7 ¢ I. 
Given a vector xz € R”, let 


where |I| stands for the number of elements in the set J. 
So, x < y if and only if for each subset J of {1,2,...,n} there exists a 
subset J with |J| = |J| such that 


(x,er) < (y, es) (11.6) 


and 
tr x =tr y. (II.7) 
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We say that x is (weakly) submajorised by y, in symbols xz <,, y, if 
condition (II.2) is fulfilled. 

Note that in the absence of (II.3), the conditions (II.2) and (II.4) are not 
equivalent. We say that x is (weakly) supermajorised by y, in symbols 
xz <” y, if condition (II.4) is fulfilled. 


Exercise II.1.1 (i) t<~y@ua xy y andax <” y. 
(11) If a is a positive real number, then 


Lx<wY > ar <p Ay, 


g<“ y> ax <" ay. 


(itt) 2 <~ywy oS —x <¥ —y. 
(tv) For any real number a, 


r<y> ax ~< ay. 


Remark IT.1.2 The relations <,<,, and <” are all reflexive and tran- 
sitive. None of them, however, is a partial order. For example, fxro~<y 
and y < x, we can only conclude that x = Py, where P is a permutation 
matrix. If we say that x ~ y whenever x = Py for some permutation matriz 
P, then ~ defines an equivalence relation on R”. If we denote by Reym the 
resulting quotient space, then ~< defines a partial order on this space. This 
relation is also a partial order on the set {er € R": 2, >.---> Ln}. These 
statements are true for the relations <,, and <~” as well. 


For a,b € R, let aV b = max(a,b) andaAb= min(a,b). For z,y € R®, 
define 
EVY=(21V Y1,---;2n V Yn) 
TAY =(L1Ny1,--.,2n A Yn). 
Let 


gto= xzVQ, 
Iz] = x V(-z). 


In other words, zt is the vector obtained from x by replacing the negative 


coordinates by zeroes, and |z| is the vector obtained by taking the absolute 
values of all coordinates. 


With these notations we can prove the following characterisation of ma- 
jorisation that does not involve rearrangements: 


Theorem II.1.3 Let x,y € R”. Then, 
(t) & <Xy y if and only if for allt ER 


di —t)yr< doy; — t)*. (II.8) 
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(it) t <” y if and only if for allt ER 
St-2))* < Plt y)*. (IL) 
j=l j=1 


(itt) x < y if and only if for allt EC R 
dole; - tl < Soly; - dl. (II.10) 
j=l j=l 


Proof. Let x <x, y. If t > xt, then (2; — t)* = 0 for all 7, and hence 
(II.8) holds. Let zt ai<t< xt for some 1 < k < n, where, for convenience, 
oa = —oo. Then, 


n 


M 
& 

a 
= 
I 


k k 
So (a; —t) = Soa! — ket 
j=1 j=1 


k k 
dus — kt Dy; - 1)", 
j=l j=l 


lA 


and, hence, (II.8) holds. 
To prove the converse, note that if t = Yi, then 


n k k 
Yo — tt = SO; - 1) = Sou} — ke 
j=l j=l j=l 
But 
k k k 
Yoab—k = Yee}-9<Vee}—9° 
j=l j=l j=l 
< So(aj-t)t =) o(a; -2)* 
j=1 j=1 


1€., 0 ~<yw Y. 
This proves (i). The statements (ii) and (iii) have similar proofs. a 


Corollary II.1.4 If xz < y in R” and u ~ w in R”, then (z,u) X (y, w) 
in R"+™., In particular, x < y if and only if (z,u) ~ (y,u) for all u. 
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An n x n matrix A = (a;;) is called doubly stochastic if 


Aig >0 for all 2,3; (IT.11) 
Sai; =1 for all j, (II.12) 
i=1 
Soa =1 forall i. (11.13) 
j=l 


Exercise II.1.5 A linear map A on C” is called positivity-preserving if 
at carries vectors with nonnegative coordinates to vectors with nonnegative 
coordinates. It is called trace-preserving if tr Ar= tr x for all x. It is 
called unital if Ae = e. Show that a matrix A is doubly stochastic if and 
only if the linear operator A 1s positivity-preserving, trace-preserving and 
unital. Show that A is trace-preserving if and only if its adjoint A* is unital. 


Exercise ITI.1.6 (i) The class of n x n doubly stochastic matrices is a 
conver set and is closed under multiplication and the adjoint operation. It 
1s, however, not a group. 

(ii) Every permutation matrix is doubly stochastic and is an extreme 
point of the convex set of all doubly stochastic matrices. (Later we will 
prove Birkhoff’s Theorem, which says that all extreme points of this convex 
set are permutation matrices. ) 


Exercise ITI.1.7 Let A be a doubly stochastic matrix. Show that all e1gen- 


values of A have modulus less than or equal to 1, that 1 is an exgenvalue of 
A, and that ||A|| = 1. 


Exercise IT.1.8 [f A is doubly stochastic, then 
|Az| < A(|z)), 


where, as usual, |x| = (|xi|,...,|an|) and we say that x < y fx; <y; for 
all 7. 


There is a close relationship between majorisation and doubly stochastic 
matrices. ‘This is brought out in the next few theorems. 


Theorem II.1.9 A matrix A is doubly stochastic if and only if Ax ~ x 
for all vectors x. 


Proof. Let Ar ~< «z for all x. First choosing x to be e and then 
e; = (0,0,...,1,0,...,0),1 < i < n, one can easily see that A is dou- 
bly stochastic. 

Conversely, let A be doubly stochastic. Let y = Az. To prove y ~ x we 
may assume, without loss of generality, that the coordinates of both r and 
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y are in decreasing order. (See Remark II.1.2 and Exercise II.1.6.) Now 
note that for any k,1 << k <n, we have 


k k n 
) Y= S S AjjLj. 
j=l j=1 1=1 
k n 


If we put t; = Sais, then 0 < t; < 1 and Sti = k. We have 


j=l i=1 
k k n k 
Soy = S oz; = S tix; _ Sai 
j=l j=l i=1 i=1 
n k n 
= S tii — Soa + (k — S  ti)ar 
i=1 i=1 w=1 


k Tm 
= >t —1)(a; —xE) + S- ti(ri — Lx) 


t=k+1 


lA 


0. 


Further, when k = n we must have equality here simply because A is doubly 
stochastic. Thus, y ~ z. | 


Note that if z,y € R? and x ~ y then 


(x1, 22) = (ty, + (1 — t)ye, (1 — thy + tye) for some 0 <t <1. 


Note also that if x,y € R” and z is obtained by averaging any two coordi- 
nates of y in the above sense while keeping the rest of the coordinates fixed, 
then x ~< y. More precisely, call a linear map 7’ on R” a T-transform if 
there exists 0 < ¢ < 1 and indices 7, k such that 


Ty = (yry-- +s Yj—1s buy + (1 — thes vjtis- ++) (1 — t)y5 + tye, Yori, ++ -5 Yn): 
Then, Ty ~ y for all y. 
Theorem IT.1.10 For z,y € R”, the following statements are equivalent: 
(i) x ~ y. 
(ii) x is obtained from y by a finite number of T-transforms. 


(iii) x is in the convex hull of all vectors obtained by permuting the coor- 
dinates of y. 


(iv) x = Ay for some doubly stochastic matrix A. 
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Proof. When n = 2, then (i) => (ii). We will prove this for a general 
n by induction. Assume that we have this implication for dimensions up 
to n—1. Let z,y € R”. Since x! and y! can be obtained from z and y 
by permutations and each permutation is a product of transpositions — 
which are surely T-transforms, we can assume without loss of generality 
that 2] > %2 > +--+: > Zp, and y1 > yo > --: > Yn. Now, if x < y, then 
Yn S71 < y1. Choose k such that yz < 21 < ye_1. Then x1 = ty; +(1—t)yz 
for some 0 < t < 1. Let 


Tz = (tz, + (1 — t)zp, 22,.-., 2h-1, (1 — t)zy + tzp, Ze41,---5 Zn) 


for all z €¢ R”. Then note that the first coordinate of Ty is 2,. Let 


x = (Zo,..-,2n) 

yo = (Yo,---5Ye—1,(1 — t)yr + tyes Yetis--+5 Yn): 

We will show that x’ ~ y’. Since y; >--- > Yr-1 2%] 2 %2 5°: > In, 
we have for2<m<k-1 


™ m™m 
dots < Dow. 
j=2 j=2 
Fork<m<n 
m k-1 m 
DY = Yow +lQ-Hnt+tyl+ SP y, 
jH2 jH2 j=k+1 
m 
= Soy -tm +(t-Dyp 
j=1 


m m m™m 
= ) Yj —% 2 ) Li —-—L@y= ) Lj. 


The last inequality is an equality when m = n since x ~ y. Thus 2’ ~ y’. 
So by the induction hypothesis there exist a finite number of T-transforms 
T2,...,T, on R"~* such that 2’ = (T,- --Ty)y’. We can regard each of 
them as a T-transform on R” if we prohibit them from touching the first 
coordinate of any vector. We then have 


(Tr ---Ti)y = (T,---T2)(x1,y') = (a1, 2) = 2 
and that is what we wanted to prove. 

Now note that a T-transform is a convex combination of the identity map 
and some permutation. So a product of such maps is a convex combination 
of permutations. Hence (ii) > (iii). The implication (iii) > (iv) is obvious, 
and (iv) = (i) is a consequence of Theorem II.1.9. = 


y] 


A consequence of the above theorem is that the set {x : x ~< y} is the 
convex hull of all points obtained from y by permuting its coordinates. 
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Exercise II.1.11 If U = (uj;) is a unitary matriz, then the matriz (|u;;|?) 
ts doubly stochastic. Such a doubly stochastic matrix is called 
unitary-stochastic; it is called orthostochastic if U is real orthogonal. 
Show that if x = Ay for some doubly stochastic matrix A, then there exists 
an orthostochastic matriz B such that x = By. (Use induction.) 


Exercise [1.1.12 Let A be ann xn Hermitian matriz. Let diag (A) denote 
the vector whose coordinates are the diagonal entries of A and A(A) the 


vector whose coordinates are the eigenvalues of A specified in any order. 
Show that 


diag (A) < X(A). (11.14) 


This is sometimes referred to as Schur’s Theorem. 


Exercise II.1.13 Use the majorisation (II.14) to prove that if dz (A) de- 


note the eigenvalues of ann xn 1 Hermitian matrix arranged in decreasing 
order then for allk = 1,2,. 


k k 
>5A;(A) = max S$“ (aj, Ax), (II.15) 


j=l 


where the maximum is taken over all orthonormal k-tuples of vectors 
{11,-..,2%} in C”. This is the Ky Fan’s maximum principle. (See 
Problem 1.6.15 also.) Show that the majorisation (II.14) can be derived 
from (II.15). The two statements are, thus, equivalent. 


Exercise II.1.14 Let A,B be Hermitian matrices. Then for all k = 1, 2, 
rs) 


k k k 
S0dj(A+ B) < SOMA) + SOME B). (II.16) 
j=l j=l j=l 
Exercise II.1.15 For any matriz A, let A be the Hermitian matriz 
~ 0 A 
= , IT.17 
A=| 9h. 9 | (11.17) 
Then the eigenvalues of A are the singular values of A together with their 
negatives. Denote the singular values of A arranged in decreasing order by 


81(A),...,5n(A). Show that for any two n x n matrices A, B and for any 
k=1,2,...,n 


k 
dsi( (A+B)< Des (A) + Sole (11.18) 


When k = 1, this is just the triangle inequality for the operator norm ||Al|. 
For each il <k <n, define ||Al|(n) = iat s;(A). From (II.18) it follows 
that ||Al|(x) defines a norm. These norms are called the Ky Fan k-norms. 
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II.2 Birkhoff’s ‘Theorem 


We start with a combinatorial problem known as the Matching Problem. 
Let B = {bj,...,bn} and G = {91,...,9n} be two sets of n elements 
each, and let R be a subset of B x G. When does there exist a bijection f 
from B to G whose graph is contained in R? This is called the Matching 
Problem or the Marriage Problem for the following reason. Think of B 
as a set of boys, G as a set of girls, and (b;,9;) € R as saying that the boy 
b; knows the girl g;. Then the above question can be phrased as: when can 
one arrange a monogamous marriage in which each boy gets married to a 
girl he knows? We will call such a matching a compatible matching. 
For each 7 let G; = {g; : (bi,9;) € R}. This represents the set of girls 
whom the boy 6; knows. For each k-tuple of indices 1 < i; <---< it, <n, 
k 


let Gi,..4, = LU G;,. This represents the set of girls each of whom are known 


r=1 
to one of the boys b;,,..., b;,. Clearly a necessary condition for a compatible 
matching to be possible is that |G;,...;,| > k for all k = 1,2,...,n. Hall’s 
Marriage Theorem says that this condition is sufficient as well. 


Theorem [1.2.1 (Hall) A compatible matching between B and G can be 
found if and only if 


Gi... | > k, (II.19) 
for alll <i, < +++ <i, <n, k=1,2,...,n. 


Proof. Only the sufficiency of the condition needs to be proved. This is 
done by induction on n. Obviously, the Theorem is true when n = 1. 
First assume that we have 


IGi,.4,| > k +1, 


forall 1 <ip<---<ip<nil<ke<n. In other words, if 1 << k < n, then 
every set of k boys together knows at least k+1 girls. Pick up any boy and 
marry him to one of the girls he knows. This leaves n — 1 boys and n — 1 
girls; condition (II.19) still holds, and hence the remaining boys and girls 
can be compatibly matched. 


If the above assumption is not met, then there exist k indices 21,...,12 


ky 
k <n, for which 


Gia, | =k. 


In other words, there exist k boys who together know exactly k girls. By the 
induction hypothesis these k boys and girls can be compatibly matched. 
Now we are left with n —k unmarried boys and as many unmarried girls. If 
some set of h of these boys knew less than h of these remaining girls, then 
together with the earlier k these h+k boys would have known less than A+k 
girls. (The earlier k boys did not know any of the present n — k maidens.) 
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So, condition (II.19) is satisfied for the remaining n — k boys and girls who 
can now be compatibly married by the induction hypothesis. a 


Exercise II.2.2 (The Kénig-Frobenius Theorem) Let A = (a;;) be ann x 
n matriz. If o is a permutation on n symbols, the set {@16(1)) 420(2); Ley 
Ang(n)} ts called a diagonal of A . Each diagonal contains exactly one 


element from each row and from each column of A. Show that the following 
two statements are equivalent: 


(i) every diagonal of A contains a zero element. 


(it) A hasak x & submatrix with all entries zero for some k, such that 
K+é>n. 


One can see that the statement of the K6nig-Frobenius Theorem is equiv- 
alent to that of Hall’s Theorem. 


Theorem I1.2.3 (Birkhoff’s Theorem) The set of nxn doubly stochastic 
matrices 1s a convex set whose extreme points are the permutation matrices. 


Proof. We have already made a note of the easy part of this theorem in 
Exercise I1.1.6. ‘The harder part is showing that every extreme point is a 
permutation matrix. For this we need to show that each doubly stochastic 
matrix is a convex combination of permutation matrices. 

This is proved by induction on the number of positive entries of the ma- 
trix. Note that if A is doubly stochastic, then it has at least n positive 
entries. If the number of positive entries is exactly n, then A is a permuta- 
tion matrix. 

We first show that if A is doubly stochastic, then A has at least one 
diagonal with no zero entry. Choose any k x € submatrix of zeroes that A 
might have. We can find permutation matrices P,, P2 such that P; AP: has 
the form R 

P\AP» = Oo 8 | | 
where O is a k x £ matrix with all entries zero. Since P,; AP2 is again doubly 
stochastic, the rows of B and the columns of C each add up to 1. Hence 
k+£<n. So at least one diagonal of A must have all its entries positive, 
by the Konig-Frobenius Theorem. 

Choose any such positive diagonal and let a be the smallest of the ele- 
ments of this diagonal. If A is not a permutation matrix, then a < 1. Let P 
be the permutation matrix obtained by putting ones on this diagonal and 
let A aP 

B= ——. 
l—a 
Then B is doubly stochastic and has at least one more zero entry than A 
has. So by the induction hypothesis B is a convex combination of permu- 
tation matrices. Hence so is A, since A = (1—a)B+aP. = 
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Remark. There are n! permutation matrices of size n. Birkhoff’s Theorem 
tells us that every n x n doubly stochastic matrix is a convex combination 
of these n! matrices. This number can be reduced as a consequence of a 
general theorem of Carathéodory. This says that if X is a subset of an 
m-dimensional linear variety in R%, then any point in the convex hull of 
X can be expressed as a convex combination of at most m+ 1 points of X. 
Using this theorem one sees that every n x n doubly stochastic matrix can 
be expressed as a convex combination of at most n? — 2n + 2 permutation 
matrices. 

Doubly substochastic matrices defined below are related to weak ma- 
jorisation in the same way as doubly stochastic matrices are related to 
mayjorisation. 

A matrix B = (b,;) is called doubly substochastic if 


b;; > 0 for all 2,7, 


Sb: <1 forall j, 
i=1 


S bi; < 1 for all 1. 


j=1 


Exercise II.2.4 B is doubly substochastic if it 1s positivity-preserving, 
Be <e, and B*e <e. 


Exercise II.2.5 Every square submatrix of a doubly stochastic matrix is 
doubly substochastic. Conversely, every doubly substochastic matrix B can 
be dilated to a doubly stochastic matrix A. Moreover, if B is annxn 
matriz, then this dilation A can be chosen to have size at most 2n x 2n. 
Indeed, if R and C are the diagonal matrices whose jth diagonal entries 
are the sums of the jth rows and the jth columns of B, respectively, then 


B I-R 
A= ( I-—-C  B* ) 
1s a doubly stochastic matriz. 


Exercise II.2.6 The set of alln x n doubly substochastic matrices is con- 
ver; its extreme points are matrices having at most one entry 1 in each row 
and each column and all other entries zero. 


Exercise II.2.7 A matrix B with nonnegative entries is doubly substochas- 


tic tf and only if there exists a doubly stochastic matrix A such that O55 < aa; 
for alli,7 =1,2,...,n. 


Our next theorem connects doubly substochastic matrices to weak 
majorisation. 
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Theorem ITI.2.8 (i) Let x,y be two vectors with nonnegative coordinates. 
Then x <y y if and only if x = By for some doubly substochastic matriz 
B 


(it) Let zy € R”. Then x <, y if and only if there exists a vector u 
such that x <u and u ~ y. 


Proof. Ifz,u € R” and z < u, then clearly x ~,, u. So, if in addition 
u~<y, then x ~y y. 

Now suppose that z,y are nonnegative vectors and x = By for some 
doubly substochastic matrix B. By Exercise II.2.7 we can find a doubly 
stochastic matrix A such that bj; < a;; for all 1,7. Then x = By < Ay. 
Hence, © ~y y. 

Conversely, let x, y be nonnegative vectors such that + <,, y. We want to 
prove that there exists a doubly substochastic matrix B for which x = By. 
If x = 0, we can choose B = 0, and if x < y, we can even choose B to 
be doubly stochastic by Theorem II.1.10. So, assume that neither of these 
is the case. Let r be the smallest of the positive coordinates of x, and let 
s=2 y; — X 2;. By assumption s > 0. Choose a positive integer m such 


that r > s/m. Dilate both vectors x and y to (n + m)-dimensional vectors 
x',y’ defined as 


4 6 = (%1,.--,2%n,8/m,...,8/m), 
(Yi,--+5Yn,0,...,0). 


Then x’ ~ y’. Hence x’ = Ay’ for some doubly stochastic matrix of size 
n+m. Let B be the n x n submatrix of A sitting in the top left corner. 
Then B is doubly substochastic and z = By. This proves (i). 

Finally, let x,y € R” and x <, y. Choose a positive number f¢ so that 
x+te and y+te are both nonnegative, where e = (1,1,...,1). We still have 
r+te <y ytte. So, by (i) there exists a doubly substochastic matrix B such 
that x +te = B(y+te). By Exercise II.2.7 we can find a doubly stochastic 
matrix A such that bj; < a;; for alli,7. But then zt+te < A(y+te) = Ay-+te. 
Hence, if u= Ay, then x < u and u ~ y. a 


Exercise II.2.9 A matriz A is doubly substochastic if and only if for every 
xz >0 we have Ax > 0 and Ax <,, x. (Compare with Theorem II.1.9.) 


Exercise II.2.10 Let x,y € R” and let x > 0,y > 0. Then a <y y if 
and only if x is in the convex hull of the 2”n! points obtained from y by 
permutations and sign changes of its coordinates (i.e., vectors of the form 
(+$y5(1)) EYo(2)1-+->FYo(n)), where o is a permutation). 
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II.3. Convex and Monotone Functions 


In this section we will study maps from R” to R™ that preserve various 
orders. 

Let f : R — R be any function. We will denote the map induced by f 
on R” also by f; i.e., f(z) = (f(21),..-,f(an)) for x € R”. An elementary 
and useful characterisation of majorisation is the following. 


Theorem IT.3.1 Let x,y € R”. Then the following two conditions are 
equivalent: 


(i) c~y. 
(wi) tr p(x) < tr ly) for all conver functions y from R to R. 


Proof. Let x < y. Then x = Ay for some doubly stochastic matrix A. So 


tL, = ) QijYj;, where a;; > 0 and ) a;,; = 1. Hence for every convex func- 


= =I 
) Tr ) n nr 

tion y, (ti) < 4 aijp(y;). Hence Sv(2i) < S (aiz9(y;) = S “v(y;). 
j=l i=1 i,j j=l 

To prove the converse note that for each ¢ the function y;(xr) = |z — t| is 

convex. Now apply Theorem II.1.3 (iii). | 


Exercise II.3.2 For x,y € R” the following two conditions are equivalent: 
(1) © Xy y. 


(11) tr p(x) < tr y(y) for all monotonically increasing convex functions 
y from R to R. 


Note that in the two statements above it suffices to consider only con- 
tinuous functions. 


A real valued function y on R” is called Schur-convex or S-convex if 


t<y => (zx) < vy). (1.20) 


(This terminology might seem somewhat inappropriate because the condi- 
tion (II.20) expresses preservation of order rather than convexity. However, 
the above two propositions do show that ordinary convex functions are 
related to this notion. Also, if x ~ y, then x is obtained from y by an 
averaging procedure. The condition (II.20) says that the value of y is di- 
minished when such a procedure is applied to its argument. Later on, we 
will come across other notions of averaging, and corresponding notions of 
convexity. ) 


We will study more general maps that include Schur-convex maps. 
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Consider maps ® : R” — R™. The domain of ® will be either all of R” or 
some convex set invariant under coordinate permutations of its elements. 
Such a map will be called monotone increasing if 


t<y => Px) < Oy), 
monotone decreasing if 
—® is monotone increasing, 
convex if 
®(tz + (1 —t)y) < t®(xz) + (1-t)®(y), O<t<1 


concave if 


y] 


—® is convex, 
isotone if 
cr<y = (2) xy By), 
strongly isotone if 
ECX~wy => (2) Xy By), 


and strictly isotone if 
c<YyY => G(r) ~ Dy). 


Note that when m = 1 isotone maps are precisely the Schur-convex maps. 
The next few propositions provide examples of such maps. We will denote 
by S, the group of n x n permutation matrices. 


Theorem II.3.3 Let ® : R” — R™ be a convex map. Suppose that for any 
PeS, there exists P’ € S,, such that 


@(Px) = P’®(x) forall x ER”. (11.21) 
Then ® is isotone. In addition, 1f ® 1s monotone increasing, then ® is 
strongly isotone. 
Proof. Let x < y in R”. By Theorem II.1.10 there exist P,,..., Py in Sy, 
and positive real numbers ¢),...,¢4 with Xt; = 1 such that 

LL = Lt; Py. 
So, by the convexity of ® and the property (II.21) 
®(x) < Lt; O(Pjy) = Xt; P;O(y) = z, say. 


Then z = ®(y) and ®(z) < z. So ®(x) <x, ®(y). This proves that ® is 
isotone. 

Suppose ® is also monotone increasing. Let u <y y. Then by Theorem 
II.2.8 there exists x such that u < x ~ y. Hence ®(u) < ®(z) and ®(z) <,, 
&(y). So, P(u) x, &(y). This proves ® is strongly isotone. | 
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Corollary II.3.4 [fp : R — R 1s a conver function, then the induced 
map i: R” — R” is isotone. If yp 1s convex and monotone on R, then the 
induced map is strongly isotone on R”. 


Note that one part of Theorem II.3.1 and Exercise II.3.2 is subsumed by 
the above corollary. 


Example II.3.5 From the above results we can conclude that 
(i) t<y inR” = |a| Xz yl. 
(iit) e<y inR® > 2? X,, y?. 
(iti) x Xyy in RY > 2? <x, y? forp> 1. 
(iv) &Xyy inR® > a2t RX, y*. 


(vu) If p is any function such that y(e’) is conver and monotone increas- 
ing int, then logx x, logy in RY => y(z) <y vy). 


(vi) log x <y log y in RY > a Xy y. 


(vit) For x,y € RY 
k k k 
1 L 1 4 
[]e5 <[[y.1sk<n= S wes ) y;,1Sk <n. 


Here R' stands for the collection of vectors x > 0 (or, at places, x > 0). 
All functions are understood in the coordinatewise sense. Thus, e.g., |x| = 
(|vi|,---,|an|). 


As an application we have the following very useful theorem. 


Theorem II.3.6 (Weyl’s Majorant Theorem) Let A be ann x n matrix 
with singular values s; >--- > s, and eigenvalues A,,:..,An arranged in 
such a way that |\;| > --- > |An|. Then for every function y:R,—-R,, 
such that p(e') is conver and monotone increasing in t, we have 


(p(IA|),---, P(An|)) Xw (y(s1),---,9(Sn)). (1.22) 
In particular, we have 
(\Ai|?,---5 Anl?) <a (si,..., 8°), (IT.23) 


for allp > 0. 


Proof. The spectral radius of a matrix is bounded by its operator norm. 
Hence, 
[Ai] < |[Al| = 81. 


II.3 Convex and Monotone Functions 43 


Apply this argument to the antisymmetric tensor powers A*A. This gives 


k k 
[[Pul< [[s;, 1sk<n. (11.24) 
Now use the assertion of IT.3.5 (vii). a 


Note that we have n , 
[sl = []s:, (II.25) 
j=1 j=1 
both the expressions being equal to (det A*A)1/2. 


Remark I1.3.7 Returning to Theorem IT.3.3, we note that when m = 1 
the condition (II.21) just says that ® is permutation invariant; i.c., 


®(Px) = ®(z), (11.26) 


for all x € R” and P € S,. So, in this case Theorem II.3.3 says that if 
a function ® : R” — R ts convex and permutation invariant, then it is 
isotone (t.e., Schur-convex). 


Also note that every isotone function ® from R” to R has to be permu- 
tation invariant because Px and x majorise each other and hence isotony 
of ® implies equality of ®(Px) and ®(z) in this case. 

However, we will see that not every isotone function from R” to R (i.e. 
not every Schur-convex function) is convex. 


Exercise [1.3.8 Let V : R” — R be any convex function and let ®(x) = 


max W(Pz). Prove that ® is isotone. If, in addition, V is monotone in- 
ES, 


creasing, then ® is strongly isotone. 


Exercise II.3.9 Let p: R— R be convex. For each k = 1,2,...,n, define 
functions yp : IR" 3 R by 


k 


p (x) = max } /9(29(9)); 
j=l 


where o runs over all permutations on n symbols. Then y*) is isotone. If, 
in addition, y is monotone increasing, then vy“) is strongly isotone. Note 
that this applies, in particular, to 


p(x) = d (5) = tr (x). 


Compare this with Theorem II.3.1. The special choice p(t) = t gives p\*) (x) = 


k 
Sve! 
q7=1 
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Example II.3.10 For xz € R” let z = 23 rj. Let 


V(x) = — S (a; - 2). 
j 
This is called the variance function. Since the maps x; — (xz; — Z)* are 


convex, V(x) is tsotone (1.e., Schur-convez). 


Example IT.3.11 For z € RY let 


H(z) = —-S ox; log 25, 
j 


where by convention we put t logt = 0, if t = 0. Then H 1s called the 
entropy function. Since the function f(t) =t logt is convex fort > 0, we 
see that —H(x) is isotone. (This is sometimes expressed by saying that the 
entropy function is anti-isotone or Schur-concave on R%..) In particular, if 
x; >0 and iz; = 1 we have 


1 1 

H(1,0,...,0) < H(a,...,2,) < H(-,...,-—), 

( ) S$ A(x1,...,%n) S H(~,...,—) 
which ts a basic fact about entropy. 


Example ITI.3.12 For p> 1 the function 


2s zsotone on R'. In particular, if x; > 0 and uz; = 1, we have 


nP-l 
Example I1.3.13 A function ® : R" — R, is called a symmetric gauge 
function if 
(1) ® is a norm on the real vector space R”, 
(i) ®(Px) = G(2) for allz EC R",P € Sh, 
(ttt) B(e,71,...,EnLn) = O(21,...,2y) tf é; = +1, 
(iv) &(1,0,...,0) = 1. 


(The last condition is an inessential normalisation. ) Examples of sym- 
metric gauge functions are 


n 


B(x) = (So lejP)!, 1<p<oo, 
®.(r) = max |z,|. 


1<j<n 
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These norms are commonly used in functional analysis. If the coordinates 
of x are arranged so as to have |x,| >--- > |xy|, then 


k 
P(r) = dolaj|, LSk<n 
j=l 


is also a symmetric gauge function. This is a consequence of the majorisa- 
tions (II.29) and (i) in Examples IT.3.5. 


Every symmetric gauge function is conver on R” and is monotone on 
R% (Problem II.5.11). Hence by Theorem II.3.8 it is strongly isotone; 1.e., 


rX<yy inRt => ®(z) < By). 


For differentiable functions there are necessary and sufficient conditions 
characterising Schur-convexity: 


Theorem II.3.14 A differentiable function ® : R” — R is isotone if and 
only if 


(1) ® is permutation invariant, and 


(it) for each z € R” and for all i,j 


Proof. We have already observed that every isotone function is permu- 
tation invariant. To see that it also satisfies (ii), let i = 1,7 = 2, without 
any loss of generality. For 0 < ¢ < 1 let 


a(t) = ((1 — t)x, + tre, tay + (1 —t)re,23,.-.,2n). (II.27) 
Then x(t) <~ x = x(0). Hence ®(x(t)) < ®(xz(0)), and therefore 


d 


o> a(a(t)) e@-= 


0 = —(£, - t2)(a (2) — a, ) 
This proves (ii). 

Conversely, suppose ® satisfies (i) and (ii). We want to prove that ®(u) < 
®(x) if u < x. By Theorem II.1.10 and the permutation invariance of ® we 
may assume that 


u=((1—s)xr, + 8%2, 8%, + (1— $)x2,23,.--,2n) 
for some 0 < s < $. Let x(t) be as in (11.27). Then 


d 


®(u) — P(r) = [ a Fale) at 
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° O® O® 
— — x) | — — ——(x(t))| dt 
[er- 2 | FP) - Pe) 
° z(t), — z(t)o O® O® | 
= —_f, “S| —(a(t)) — —(2(t))| at 
[SE | Few -Zew 
< 0, 
because of (ii) and the condition 0 < s < 3. | 


Example II.3.15 (A Schur-conver function that is not conver) Let 
®: I? +R, where I = (0,1), be the function 


® (201,09) = log(— ~1)+ log(— ~1). 
Using Theorem II.3.14 one can check that ® is Schur-convex on the set 
{e: x2 €I?,a, +29 < 1}. 
However, the function log(+ — 1) is convex on (0, 4] but not on [5,1). 


Example II.3.16 (The elementary symmetric polynomials ) For each kk = 
1,2,---,n, let S, : R" — R be the functions 


Sp(x) = S> Lj, Lin ++ Li,. 


1<41 <ig<---<ip<n 


These are called the elementary symmetric polynomials of the n variables 


T1,---,X%n. These are invariant under permutations. We have the identities 
Ap kz» vee Zn) = Sk—1(21, vee £5, vee Ln) 
Lj 
and 
Sx(21,...,24,...,2n) — Sx(@1,...,85,...,2n) 
= (x; —_ ©i)Sp—1(21, tee Xi; wee 2X5, see ,Zn); 


where the circumflex indicates that the term below it has been omitted. 
Using these one finds via Theorem [1.3.14 that each S, is Schur-concave; 
1.€., —S, ts isotone, on RY‘. 


Tr nr 
The special case k = n says that ifr, y € RY and z ~ y, then [[= > Ly. 
j=l j=l 
Theorem 1.3.17 (The Hadamard Determinant Theorem) If A is annxn 
positive matrix, then 


det A < Tl 


j=1 
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Proof. Use Schur’s Theorem (Exercise II.1.12) and the above statement 
about the Schur-concavity of the function f(x) = I]; on RY 


|_| 

More generally, if Ai,...,An are the eigenvalues of a positive matrix A, 
we have for k = 1,2,...,n 

Sr(A1, wae , An) < Sk (a11, wee , Ann). (IT.28) 


Exercise II.3.18 If A is anm x n complex matriz, then 


det(AA*) < Tl 5 ax,l2. 


i=1 j=1 


(See Exercise I.1.3.) 


Exercise II.3.19 Show that the ratio S,(x)/S,-1(x) is Schur-concave on 


the set of positive vectors for k = 2,...,n. Hence, if A is a positive matriz, 
then 
Sn(@11,--+;4nn) Sn—1(@11,--+;Ann) _ Si (@i1,---,@nn) 
Sn(A1,--+;An) a Sn—1(A1,---;An) — S1(Ai,---)An) 
tr A 
= 2 21. 

tr A 

Proposition II.3.20 If A is ann x n positive definite matrix, then 


tr AB 


(det A)/” = min : B is positive and det B = i} ; 


If A is positive semidefinite, then the same relation holds with min replaced 
by inf. 


Proof. It suffices to prove the statement about positive definite matrices; 
the semidefinite case follows by a continuity argument. Using the spectral 
theorem and the cyclicity of the trace, the general case of the proposition 
can be reduced to the special case when A is diagonal. So, let A be diago- 
nal with diagonal entries A,,...,An. Then, using the arithmetic-geometric 
mean inequality and Theorem II.3.17 we have 


was - = b;; > LD) yn 11s, )i/" > (det A)/"(det BY)”, 


for every positive matrix B. Hence, trap > (det A)!/” if det B = 1. When 
B = (det A)!/"A~! this becomes an equality. a 


Corollary II.3.21 (The Minkowski Determinant Theorem) If A,B are 
nxn positive matrices then 


(det(A + B))\/" > (det A)!/” + (det B)'/”. 
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II.4 Binary Algebraic Operations and 
Mayjorisation 


For z € R” we have seen in Section II.1 that 


k 
2 = mar (x; er). 
j= 


It follows that if x,y € R”, then 
c+y<att+yt'. (II.29) 


In this section we will study majorisation relations of this form for sums, 
products, and other functions of two vectors. 
A map y: R? — R is called lattice superadditive if 


y(s1, t1) + (p(S2, ta) < y(s1 V 82,1 V to) + y(s1 A 82,t1 A to). (II.30) 


We will call a map y monotone if it is either monotonically increasing or 
monotonically decreasing in each of its arguments. 

In this section we will adopt the following notation. Given y : R? > R, 
we will denote by ® the map from R” x R” to R” defined as 


P(x, y) = ((21, 91), +++, P(Lns Yn). (11.31) 


Example II.4.1 (i) y(s,t) = s+t is a monotone and lattice superadditive 
function on R?. 


(tt) p(s, t) = st ts a@ monotone and lattice superadditive function on R4.. 
For (1) above we have 


P(z,y)=(t1+y1,.--,2n+ Yn) for z,y eR”, 
and for (11) we have 
O(x,y) = (L1y1,---,2nYn) for x,y € R”. 
Theorem II.4.2 If» is monotone and lattice superadditive, then 
O(x',y!) xy O(x,y) <y @(x!, y*), (11.32) 


for all z,y € R”. 


Proof. Note that if we apply a coordinate permutation simultaneously to 
x and y, then ®(z, y) undergoes the same coordinate permutation. The two 
outer terms in (II.32) remain unaffected and so do the majorisations. Hence, 
to prove (11.32) we may assume that z = z!; ie., 21 > ro > --- > tp. Next 
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note that we can find a finite sequence of vectors uu)... ,u%) such 
that 


yl = y) yl = uN) y = u) for some 1 < Z<N, 
and each u‘*+1) is obtained from ul*) by interchanging two components in 
such a way as to move from the arrangement y‘ to y! ; i.e., we pick up two 


indices 2,7 such that 


7< 7 and ul’) > ull) 


and interchange these two components to obtain the vector ut). So, to 
prove (II.32) it suffices to prove 


@(x,u"t)) ~,, @(x, u)) (11.33) 


fork = 0,1,...,N—1. Since we have already assumed x] > 42 >-:- > Zn, 
to prove (II.33) we need to prove the two-dimensional majorisation 


(p(81, t2), p(S2,t1)) Xw (Y(81, 41), Y(s2, t2)) (11.34) 


if $s; > So and t, > to. Now, by the definition of weak majorisation, this is 
equivalent to the two inequalities 


y(si,t2) V v(se,ti) < (si,ti) V v(se, te), 
(s1,t2)+ p(se,ti) < y(si,ti) + y(se, te), 


for 8; > Sg and ft; > tg. The first of these follows from the monotony of y 
and the second from the lattice superadditivity. a 


Corollary [1.4.3 For z,y € R” 


attyl <a2+y~atsy!. (11.35) 
For x,y € RY 

ate yl x, 2-y <y, at-y!, (11.36) 
where x-y = (41Y1,---,LnYn)- 


Corollary II.4.4 For x,y € R” 
(at, y") < (x,y) < (at, y’). (11.37) 


Proof. If xz >0 and y > 0, this follows from (II.36). In the general case, 
choose t large enough so that x + te > 0 and y+ te => 0 and apply the 
special result. a 


The inequality (II.37) has a “mechanical” interpretation when x > 0 
and y > 0. On a rod fixed at the origin, hang weights y; at the points 
at distances z; from the origin. The inequality (II.37) then says that the 
maximum moment is obtained if the heaviest weights are the farthest from 
the origin. 
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Exercise II.4.5 The function y : R* — R defined as y(s,t) = sAt is 
monotone and lattice superadditive on R*. Hence, for x,y € R” 


at Ay! <ytANyRyattAy?. 


11.5 Problems 


Problem II.5.1. If a doubly stochastic matrix A is invertible and A7! is 
also doubly stochastic, then A is a permutation. 


Problem II.5.2. Let y ¢ R}. The set {2 : « € R%.,x <y y} is the convex 
hull of the points (71Yo(1),---,TnYo(n)), Where o varies over permutations 
and each r; is either 0 or 1. 


Problem IT.5.3. Let y € R”. The set {x € R” : |z| <, |y|} is the 


convex hull of points of the form (E1Yo(1))-+- :EnYo(n)), where o varies over 
permutations and each e; = +1. 


Problem II.5.4. Let A = (42 422) be a 2 x 2 block matrix and let 
C(A) = (42 Ass.) IfU = (¢ °,), then we can write 


C(A) = 5(A+U AU"). 


Let A(A) and s(A) denote the n-vectors whose coordinates are the eigen- 
values and the singular values of A, respectively. 
Use (II.18) to show that 


s(C(A)) ~y s(A). 
If A is Hermitian, use (II.16) to show that 
A(C(A)) ~ A(A). 


Problem II.5.5. More generally, let P,,... ,P, be a family of mutually 
orthogonal projections in C” such that ®P; = I. Then the operation of 
taking A to C(A) = UP;AP,; is called a pinching of A. In an appropriate 
choice of basis this means that 


Ai; Ajo ++: Aj, Ait 
A= fo |, e(A) = 


Ary Are mr Apr Apr 


Each such pinching is a product of r — 1 pinchings of the 2 x 2 type intro- 
duced in Problem II.5.4. Show that for every pinching C 


s(C(A)) <w s(A) (11.38) 
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for all matrices A, and 
A(C(A)) ~ A(A) (11.39) 


for all Hermitian matrices A. When P,,...,P,, are the projections onto the 
coordinate axes, we get as a special case of (II.38) above 


itr Al < 0 s;(A) = ||Alli. (11.40) 
j=l 
From (II.39) we get as a special case Schur’s Theorem 
diag (A) ~ X(A), 
which we saw before in Exercise II.1.12. 


Problem II.5.6. Let A be positive. Then 


det A < det C(A), (11.41) 


for every pinching C. This is called Fischer’s inequality and includes the 
Hadamard Determinant Theorem as a special case. 


Problem II.5.7. For each k = 1,2,...,n and for each pinching C show 
that for positive definite A 


Sk(MA)) < Se(A(C(A))), (II.42) 


where S;(A(A)) denotes the kth elementary symmetric polynomial of the 
eigenvalues of A. This inequality, due to Ostrowski, includes (II.28) as a 
special case. It also includes (II.41) as a special case. 


Problem II.5.8. If A* A denotes the kth antisymmetric tensor power of 
A, then the above inequality can be written as 


tr A* A <tr A* (C(A)). (11.43) 


The operator inequality 
7 AFA < A*(C(A)) 


is not always true. This is shown by the following example. Let 


200 1 100 0 
0110 10 1 0 0 
A='9 120/'?=]0 00 0]? 
1001 000 0 


and let C be the pinching induced by the pair of projections P and I — P. 
(The space A?C* is 6-dimensional.) 
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Problem II.5.9. Let {\1,...,An}, {t41,---, Un} be two n-tuples of complex 
numbers. Let 
d(X, #) = min max Aj = Mo(gyl, 


where the minimum is taken over all permutations on n symbols. This is 
called the optimal matching distance between the unordered n-tuples 
» and y. It defines a metric on the space C{,,,, of such n-tuples. Show that 
we also have 


T,JC{1,2,..., n} 
[I |+|J|=n+1 JET 


Problem II.5.10. This problem gives a refinement of Hall’s Theorem un- 
der an additional assumption that is often fulfilled in matching problems. 
In the notations introduced at the beginning of Section II.2, define 


B, = {b; : (bj, 9:) € R}, 1l<i<n. 
This is the set of boys known to the girl g;. Let 
k 
Buy --iz = LU B;., Ll<ay<-++ <a <n. 
r=1 


Suppose that for each k = 1,2,..., [2 | and for every choice of indices 
L<ty<-s+ <a, <n, 


IGi,..i,| > & and |Bj,...4,| > k. 
Show that then 
|Gi,..4,| > k for alk =1,2,...,.n,1 <i) <---<ip <n. 
Hence a compatible matching between B and G exists. 


Problem I1.5.11. (i) Show that every symmetric gauge function is con- 
tinuous. 


(ii) Show that if © is a symmetric gauge function, then ®,, (tz) < ®(z) < 
®1(z) for all x € R”. 


(ili) If ® is a symmetric gauge function and 0 < t; <1, then 


O(t121,...,tntn) < O(x,... , Ln). 


(iv) Every symmetric gauge function is monotone on Rt. 


(v) If z,y € R” and |z| < |y|, then (rz) < ®(y) for every symmetric 
gauge function ®. 
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(vi) If z,y € RY, then « ~<,, y if and only if ®(z) < By) for every 
symmetric gauge function ®. 


Problem IT.5.12. Let f : Ri — R, be a concave function such that 
f(0) =0. 


(i) Show that f is subadditive: f(a +b) < f(a) + f(b) for all a,b € R4. 
(ii) Let ® : R42" — R, be defined as 


o(2,y) = Sofla;) + fly), ey eR. 
j=1 q=1 


Then ® is Schur-concave. 
(iii) Note that for x,y € R} 
(x,y) < (a+ y,0) in R4”. 


(iv) From (ii) and (iii) conclude that the function 


F(x) = Sf (l23l) 


j=l 
is subadditive on R”. 


(v) Special examples lead to the following inequalities for vectors 
x,y € R”: 


n n n 
7 +y5\P < S |x|? + S sly; l?, O<p<l. 
j=1 j=1 j=l 


nr Tr Tm 
Inj + yy IZ 5 | A 
i < + | 
Lit ley tu Lit ay rear 


j=l j=l 
S clog (1 + [25 + ysl) < $5 log(1 + |x;|) + — log(1 + |yy1). 
j=l j=1 j=l 


Problem II.5.13. Show that a map y: R? — R is lattice superadditive if 
and only if 
p(x, + 64,22 — 62) + p(x, —_ 61,22 + 62) 
< v(x + 61, £2 + 62) + p(x — 61, 22 — 62) 

for all (x1, 22) and for all 6),62 > 0. If » is twice differentiable, this is 
equivalent to 

0 p(r1, £2) 
0 < ——__—. 

~~ O0x102x2 
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Problem [I.5.14. Let y : R? — R be a monotone increasing lattice super- 
additive function, and let f be a monotone increasing and convex function 
from R to R. Show that if y and f are twice differentiable, then the com- 
position f oy is monotone and lattice superadditive. When y(s,t)=s+t 
show that this is also true if f is monotone decreasing. These statements 
are also true without any differentiability assumptions. 


Problem II.5.15. For z,y € R? 
—log(zt + y') x, —log(x + y) X» —log(a«! + y!) 


log(z* + y') Xw log(a -y) <w log(a! - y'). 
From the first of these relations it follows that 


[]@ +4) <I] [@+y) < Il; x; +y!). 
jal jal jel 


Problem IT.5.16. Let x,y, u be vectors in R” all having their coordinates 
in decreasing order. Show that 


(i) (x, u) < (y,u) if x < y, 
(ii) (v,u) < (y,u) if <y y and ue R%. 


In particular, this means that if x,y €¢ R",x <, y, and u € RR", then 


1,4 
(cyUj,...,2,Uz) Xw (ypuy,...,ydud). 
[Use Theorem II.3.14 or the telescopic summation identity 


k 


St = S (aj — aj41)(b1 + --- +,), 


7=1 


where a;,b;, 1<j<k, are any numbers and Ak+1 = 0.] 


II.6 Notes and References 


Many of the results of this chapter can be found in the classic Inequalities 
by G.H. Hardy, J.E. Littlewood, and G. Polya, Cambridge University Press, 
1934, which gave the first systematic treatment of this theme. The more 
recent treatise Inequalities: Theory of Majorization and Its Applications 
by A.W. Marshall and I. Olkin, Academic Press, 1979, is a much more 
detailed and exhaustive text devoted entirely to the study of majorisation. 
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It is an invaluable resource on this topic. For the reader who wants a quicker 
introduction to the essentials of majorisation and its applications in linear 
algebra, the survey article Majorization, doubly stochastic matrices and 
comparison of eigenvalues by T. Ando, Linear Algebra and Its Applications, 
118(1989) 163-248, is undoubtedly the ideal course. Our presentation is 
strongly influenced by this article from which we have freely borrowed. 

The distance d(A, 4) introduced in Problem II.5.9 is commonly employed 
in the study of variation of roots of polynomials and eigenvalues of matri- 
ces since these are known with no preferred ordering. See Chapter 6. The 
result of Problem II.5.10 is due to L. Elsner, C. Johnson, J. Ross, and J. 
Schénheim, On a generalised matching problem arising in estimating the 
eigenvalue variation of two matrices, European J. Combinatorics, 4(1983) 
133-136. 

Several of the theorems in this chapter have converses. For illustration 
we mention two of these. 

Schur’s Theorem (II.14) has a converse; it says that if d and X are real 
vectors with d < A, then there exists a Hermitian matrix A whose diagonal 
entries are the components of d and whose eigenvalues are the components 
of A. 

Weyl’s Majorant Theorem (II.3.6) has a converse; it says that if \1,..., An 
are complex numbers and s1,...,8,, are positive real numbers ordered as 
|Ai| >--- > |A,| and s; >--- > s,, and if 

k 


k 
[[ sl < [][s; for 1<k<n, 
j=1 j=l 


Tt n 
[Pal = Ts. 
j=l j=l 


then there exists an n xX n matrix A whose eigenvalues are \1,...,A, and 
singular values s1,..., Sn. 

For more such theorems, see the book by Marshall and Olkin cited above. 

Two results very close to those in II.3.16-II.3.21 and II.5.6-II.5.8 are given 
below. 

M. Marcus and L. Lopes, Inequalities for symmetric functions and Her- 
mittian matrices, Canad. J. Math., 9(1957) 305-312, showed that the map 
® : Rv — R given by ®(x) = (S;(x))*/* is Schur-concave for 1 < k < n. 
Using this they showed that for positive matrices A, B 


(tr A¥ (A+ B)}!/* > [tr A* A]!/* + [tr A® BYY*. (11.44) 


This can also be expressed by saying that the map A —> (tr A* A)!/* is 
concave on the set of positive matrices. For k = n, this reduces to the 
statement 


idet(A + B)]!/" > [det A]!/” + [det B]/”, 


which is the Minkowski determinant inequality. 
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E.H. Lieb, Convex trace functions and the Wigner- Yanase-Dyson conjec- 
ture, Advances in Math., 11(1973) 267-288, proved some striking operator 
inequalities in connection with the W.-Y.-D. conjecture on the concavity of 
entropy in quantum mechanics. These were proved by different techniques 
and extended in other directions by T. Ando, Concavity of certain maps 
on positive definite matrices and applications to Hadamard products, Linear 
Algebra Appl., 26(1979) 203-241. One consequence of these results is the 
inequality 

AF (A+ B)VF > AFAVE 4 AR BYE (11.45) 


for all positive matrices A, B and for all k =1,2,...,n. In particular, this 
implies that 


tr AP (A+ B)¥* > tr AF AMF + tr A® BY, 


When k = n, this reduces to the Minkowski determinant inequality. Some 
of these inequalities are proved in Chapter 9. 


III 


Variational Principles for 
Eigenvalues 


In this chapter we will study inequalities that are used for localising the 
spectrum of a Hermitian operator. Such results are motivated by several 
interrelated considerations. It is not always easy to calculate the eigen- 
values of an operator. However, in many scientific problems it is enough 
to know that the eigenvalues lie in some specified intervals. Such infor- 
mation is provided by the inequalities derived here. While the functional 
dependence of the eigenvalues on an operator is quite complicated, several 
interesting relationships between the eigenvalues of two operators A, B and 
those of their sum A+ B are known. These relations are consequences of 
variational principles. When the operator B is small in comparison to A, 
then A+ B is considered as a perturbation of A or an approximation to A. 
The inequalities of this chapter then lead to perturbation bounds or error 
bounds. 

Many of the results of this chapter lead to generalisations, or analogues, 
or open problems in other settings discussed in later chapters. 


III.1 The Minimax Principle for Eigenvalues 


The following notation will be used throughout this chapter. If A,B are 
Hermitian operators, we will write their spectral resolutions as Au; = 
aj;u;,Bu; = B;vj,1 < 7 <n, always assuming that the eigenvectors u, 
and the eigenvectors v; are orthonormal and that aj > a2 2 -:- 2 Qn 
and 3B; > B2 >--: > Bn. When the dependence of the eigenvalues on the 
operator is to be emphasized, we will write \/(A) for the vector with com- 
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ponents + (A), ...,A4(A), where dj (A) are arranged in decreasing order; 
i.€., d;(A) = aj. Similarly, A'(A) will denote the vector with components 
A! (A) where A} (A) =OAn—j41,1 <9 <n. 

Theorem ITII.1.1 (Poincaré’s Inequality) Let A be a Hermitian operator 


on H and let M be any k-dimensional subspace of H. Then there exist unit 
vectors x,y in M such that (x, Ax) < dt (A) and (y, Ay) > M (A). 


Proof. Let N be the subspace spanned by the eigenvectors uz,..., Un, of 
i M(A \L(A). Th 
A corresponding to the eigenvalues \;(A),...,;,(A). Then 


dim M+dim N =n+1, 
and hence the intersection of M and N is nontrivial. Pick up a unit vector 


TL 0) 
xin MON. Then we can write x = S guj, where Sg)? = 1. Hence, 
j=k j=k 


(x, Ax) = YS“; /?At(A) < S1g;|?Ab(A) = AL(A). 
j=k j=k 


This proves the first statement. The second can be obtained by applying 
this to the operator —A instead of A. Equally well, one can repeat the 
argument, applying it to the given k-dimensional space M and the (n — 
k + 1)-dimensional space spanned by uj, ue,...,Un—K41- a 


Corollary II.1.2 (The Minimax Principle) Let A be a Hermitian opera- 
tor on H. Then 


M (A) = max min (2, Az) 
MCH rEM 
dim M=k |jz||=1 
min max (xr, Ax). 
MCH rEM 
dim M=n—k+1 |{|x|/=1 


Proof. By Poincaré’s inequality, if MM is any k-dimensional subspace of 
H, then min(z, Ar) < dM (A), where z varies over unit vectors in M. But if 
xt 


M is the span of {u1,..., ux}, then this last inequality becomes an equality. 


That proves the first statement. The second can be obtained from the first 
by applying it to —A instead of A. = 


This minimax principle is sometimes called the Courant-Fischer-Wey] 
minimax principle. 


Exercise III.1.3 In the proof of the minimaz principle we made a par- 
ticular choice of M. This choice is not always unique. For example, if 
M (A) = x41 (A), there would be a whole 1-parameter family of such sub- 


spaces obtained by choosing different eigenvectors of A belonging to Nj (A). 
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This ts not surprising. More surprising, perhaps even shocking, is the fact 
that we could have 4(A) = min{(z, Av): c € M, |x|] = 1}, even for a 
k-dimensional subspace that is not spanned by eigenvectors of A. Find an 
example where this happens. (There is a simple example. ) 


Exercise IIJ.1.4 In the proof of Theorem III.1.1 we used a basic principle 
of linear algebra: 


dim (My, M1) M2) 


dim M, + dim Mz — dim(M, + Mg) 
> dim M,+dim M2 —n, 


for any two subspaces M, and Mg of an n-dimensional vector space. Derive 
the corresponding inequality for an intersection of three subspaces. 


An equivalent formulation of the Poincaré inequality is in terms of com- 
pressions. Recall that if V is an isometry of a Hilbert space M into H, then 
the compression of A by V is defined to be the operator B = V* AV. Usu- 
ally we suppose that M is a subspace of H and V is the injection map. 
Then A has a block-matrix representation in which B is the northwest 


corner entry: 
A = ( * ) . 
x ok 


We say that B is the compression of A to the subspace M. 


Corollary III.1.5 (Cauchy’s Interlacing Theorem) Let A be a Hermitian 
operator on H, and let B be its compression to an (n — k)-dimensional 
subspace N. Then for 7 =1,2,...,n—k 

d;(A) > AZ(B) = Aj 4, (A). (III.1) 


Proof. For any j, let M be the span of the eigenvectors v1,...,u; of B 
corresponding to its eigenvalues \}(B), ae dj (B). Then (x, Bx) = (x, Ax) 
for all x € M. Hence, 


d;(B) = min (x, Ba) = min (x, Ax) < dj (A). 


|Jz||=1 []z||=2 


This proves the first assertion in (III.1). 
Now apply this to —A and its compression —B to the given subspace NV. 
Note that 


~\!(A) = AI(-A) = AL_j.,(-A)  foralll <i<n, 
and 


L _ 
—\(B) = M(-B) = AQ, —j4i(-B) for alll <j <n—k. 


60 III. Variational Principles for Eigenvalues 


Choose i= j+k. Then the first inequality yields —); (B)< -); 4,(B), which 
is the second inequality in (III.1). a 


The above inequalities look especially nice when B is the compression of 
A to an (n — 1)-dimensional subspace: then they say that 


ay > Bi > ag >-+-- > Bn-1 > An. (III.2) 


This explains why this is called an interlacing theorem. 


Exercise III.1.6 The Poincaré inequality, the minimaz principle, and the 
interlacing theorem can be derived from each other. Find an independent 
proof for each of them using Exercise III.1.4. (This “dimension-counting” 
for intersections of subspaces will be used in later sections too.) 


Exercise III.1.7 Let B be the compression of a Hermitian operator A to 
an (n —1)-dimensional space M. If, for some k, the space M contains the 
vectors Uj,..., Uz, then B; =a; forl <j <k. If M contains uz,...,Un 
then a; = 83-1 fork <j<n. 


) 


Exercise III.1.8 (i) Let A, be the n x n tridiagonal matrix with entries 
ai; = 2cos@ for all i,ai; = 1 if |i-—3| = 1, and a;; = 0 otherwise. The 
determinant of Ap is sin(n+1)6/sin 6. 


(ii) Show that the eigenvalues of A, are given by 2(cos@ + cos JZ), 
l<jcn. 


(111) The special case when a; = —2 for alli arises in Rayleigh’s finite- 
dimensional approximation to the differential equation of a vibrating string. 
In this case the eigenvalues of A, are 


\M (An) = —4 sin? —27 _ l<j<n. 
iA n) In 2(n +1)’ S7JSN 
(tv) Note that, for each k < n, the matriz An—k 18 @ compression of 


A,. This ecample provides a striking illustration of Cauchy’s interlacing 
theorem. 


It is illuminating to think of the variational characterisation of elgenval- 
ues as a solution of a variational problem in analysis. If A is a Hermitian 
operator on R”, the search for the top eigenvalue of A is just the problem 
of maximising the function F(x) = x* Ax subject to the constraint that the 
function G(x) = x*z has the fixed value 1. The extremum must occur at 
a critical point, and using Lagrange multipliers the condition for a point 
x to be critical is VF (x) = AVG(zx), which becomes Ar = Ax. Our ear- 
lier arguments got to the extremum problem from the algebraic eigenvalue 
problem, and this argument has gone the other way. 

If additional constraints are imposed, the maximum can only decrease. 
Confining x to an (n — k)-dimensional subspace is equivalent to imposing 


IlI.1 The Minimax Principle for Eigenvalues 61 


k linearly independent linear constraints on it. These can be expressed as 
H;(x) = 0, where H;(x) = w*x and the vectors w;,1 < j < k are linearly 
independent. Introducing additional Lagrange multipliers 43, the condition 
for a critical point is now VF(z) = AVG(z)+)0,; Uj VA; (2); ie., Av—Az is 
no longer required to be 0 but merely to be a linear combination of the W3. 
Look at this in block-matrix terms. Our space has been decomposed into a 
direct sum of a space NV and its orthogonal complement which is spanned 
by {wi,..., wx}. Relative to this direct sum decomposition we can write 


BC 
A= ( BS ) | 
Our vector x is now constrained to be in NV, and the requirement for it to 
be a critical point is that (A — AJ) (5) lies in. V+. This is exactly requiring 
x to be an eigenvector of the compression B. 
If two interlacing sets of real numbers are given, they can be realised as 


the eigenvalues of a Hermitian matrix and one of its compressions. This is 
a converse to one of the theorems proved above: 


Theorem III.1.9 Leta;,1 <j <n, and B,,1 <i <n-—1, be real numbers 
such that 


a, > By Dag >-+-- > Bn-1 > an. 


Then there exists a compression of the diagonal matrix A = diag(ay,...,Qn) 
having B;,1<1<n-—1, as its eigenvalues. 


Proof. Let Au; = a;u;; then {u;} constitute the standard orthonor- 
mal basis in C”. There is a one-to-one correspondence between (n — 1)- 
dimensional orthogonal projection operators and unit vectors given by 
P = [I — zz*. Hach unit vector, in turn, is completely characterised by 
its coordinates ¢; with respect to the basis u;. We have z = So ¢ju; = 
> (us z)uz, >> |¢;|? = 1. We will find conditions on the numbers ¢; so that, 
for the corresponding orthoprojector P = I — zz*, the compression of A to 
the range of P has eigenvalues {;. 
Since PAP is a Hermitian operator of rank n — 1, we must have 


Tl (A — 6) = tr A") [P(AT — A)P}. 


i=1 
If £; are the projectors defined as £; = J — uj;uj, then 
A™-*(\T — A) = 40 [ [= an) Av E;. 


j=l kj 


Using the result of Problem I.6.9 one sees that 


An—lPp. A"—" Bi; . An-lp= IG? Nie P. . 
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Since rank A”~!P = 1, the above three relations give 


iu (A — Bi) = SIGPI[O - 2x)], (III.3) 


j=l kj 


an identity between polynomials of degree n — 1, which the ¢; must satisfy 
if B has spectrum {(;}. 
We will show that the interlacing inequalities between a; and 6; ensure 


that we can find ¢; satisfying (III.3) and SoG? = 1. We may assume, 
j=l 
without loss of generality, that the a; are distinct. Put 


n—1 
[] (aq - 6) 
1=1 . 
y= =, 1 <j<n (I11.4) 
] | (e5 - ax) 
kA 


The interlacing property ensures that all ;~; are nonnegative. Now choose 
¢; to be any complex numbers with |¢;|? = +;. Then the equation (III.3) is 
satisfied for the values A = a,;,1 <7 <n, and hence it is satisfied for all A. 
Comparing the leading coefficients of the two sides of (IIJI.3), we see that 
SoG? = 1. This completes the proof. 
a 
j 


III.2 Weyl’s Inequalities 


Several relations between eigenvalues of Hermitian matrices A, B, and A+B 
can be obtained using the ideas of the previous section. Most of these results 
were first proved by H. Weyl. 


Theorem III.2.1 Let A,B ben x n Hermitian matrices. Then, 


M(A+ B) <A¥(A) +A4_444(B) fori <j, (11.5) 


Nj(A +B) > AA) +A4_,,,(B) fori >j. (III.6) 
Proof. Let u;,v;, and w; denote the eigenvectors of A,B, and A+ B 
respectively, corresponding to their eigenvalues in decreasing order. Let 
2 <j. Consider the three subspaces spanned by {wj,...,w;}, {ui,..., Un}, 
and {U;—i+1,---,Un} respectively. These have dimensions j,n — i +1, and 
n—Jj+t, and hence by Exercise III.1.4 they have a nontrivial intersection. 
Let x be a unit vector in their intersection. Then 
{ 
A;(A+ B) < (x, (A+ B)z) = (x, Az) + (x, Br) < A$(A) + b_,, ,(B). 


J 
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This proves (III.5). If A and B in this inequality are replaced by —A and 
—B, we get (III.6). 


a 
Corollary ITI.2.2 For each j = 1,2,...,n, 
| 
Aj(A) + AR (B) < Aj(A +B) < A4(A) + AL(B). (III.7) 
Proof. Put i =j in the above inequalities. a 


It is customary to state these and related results as perturbation theo- 
rems, whereby B is a perturbation of A; that is B = A+ H. In many of the 
applications H is small and the object is to give bounds for the distance of 
A(B) from A(A) in terms of H = B— A. 


Corollary III.2.3 (Weyl’s Monotonicity Theorem) If H is positive, then 
L 1 
Aj(A +H) > A5(A) for all j. 


Proof. By the preceding corollary, dz (A + H) > dj (A) + AL (A), but all 
the eigenvalues of H are nonnegative. Alternately, note that (2, (A+H)zx) > 
(x, Ax) for all x and use the minimax principal. = 


Exercise ITI.2.4 If H is positive and has rank k, then 
AV(A+ H)>;(A)>M (A+ H) forj =1,2,....n—k. 
This 1s analogous to Cauchy’s interlacing theorem. 
Exercise IITI.2.5 Let H be any Hermitian matriz. Then 
\E(A) = ||| < 4(A + H) < A3(A) + [LAI 
This can be restated as: 


Corollary III.2.6 (Weyl’s Perturbation Theorem) Let A and B be Her- 
mitian matrices. Then 


max|j(A) — ;(B)| < ||A — Bl. 
Exercise ITI.2.7 For Hermitian matrices A,B, we have 


A — Bll < max|A}(A) — A}(B)|. 


It is useful to have another formulation of the above two inequalities, 
which will be in conformity with more general results proved later. 

We will denote by Eig A a diagonal matrix whose diagonal entries are 
the eigenvalues of A. If these are arranged in decreasing order, we write 
this matrix as Big!(A); if in increasing order as Eig'(A). The results of 
Corollary III.2.6 and Exercise III.2.7 can then be stated as 
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Theorem III.2.8 For any two Hermitian matrices A, B, 
||Eig' (A) — Eig!(B)|| < ||A — B|| < |[Big*(A) — Eig'(B)|]. 


Weyl’s inequality (III.5) is equivalent to an inequality due to Aronszajn 
connecting the eigenvalues of a Hermitian matrix to those of any two com- 
plementary principal submatrices. For this let us rewrite (III.5) as 


At; -1(A +B) < A#(A) + Ad(B), (II1.8) 
for all indices 2,7 such thatz+ 7—l1<n. 
Theorem ITI.2.9 (Aronszajn’s Inequality) Let C be an n x n Hermitian 


matrix partitioned as 
C= A x 
—\ xX* BY]? 


where Aisakxk matrix. Let the eigenvalues of A,B, andC bea, >-:: 
2 Ob, G1 2-** 2 Bn—k, andy, >--- > Yn, respectively. Then 


Vitj-1 $n SG+ 68; for alli,j withi+j—-1l<n. (IIT.9) 


Proof. First assume that y, = 0. Then C is a positive matrix. Hence 
C' = D*D for some matrix D. Partition D as D = (D; D2), where D, has 
k columns. Then 


c= ( A 3 )=( pip! pip. ) 

X* &B D3D, D3D2 }° 

Note that DD* = D,D{ + D2D%. Now the nonzero eigenvalues of the 
matrix C = D*D are the same as those of DD*. The same is true for the 
matrices A = D{D, and D,D}, and also for the matrices B = D5 D2 and 
D2D3. Hence, using Weyl’s inequality (III.8) we get (III.9) in this special 
case. 


If y, #0, subtract y,J from C. Then all eigenvalues of A, B, and C are 
translated by —7,. By the special case considered above we have 


Vitj-1 — Yn S (Oi — Yn) + (Bj -— In), 
which is the same as (III.9). = 


We have derived Aronszajn’s inequality from Weyl’s inequality. But the 
argument above can be reversed. Let A, B ben x n Hermitian matrices and 
let C = A+B. Let the eigenvalues of these matrices be Q,>°-:> an, Bi > 
-++ > By, and 7; >--- > Yn, respectively. We want to prove that 745-1 < 
a; + P;. This is the same as ¥44j;-1 — (an, + Bn) < (ai — Qn) + (8B; — Br). 
Hence, we can assume, without loss of generality, that both A and B are 
positive. Then A = D}D, and B = D3 D2 for some matrices D,, Dz. Hence, 


‘ . » ne {( D 
C = D}D, + Di D2 = (D* D3) ( Dp ). 
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Consider the 2n x 2n matrix 


_[{ Dy « nx, ( DD D,D> 
n= ( Do ) D2) = ( D2Dt D2Ds )° 
Then the eigenvalues of E are the eigenvalues of C together with n ze- 
roes. Aronszajn’s inequality for the partitioned matrix E then gives Weyl’s 
inequality (III.8). 
By this procedure, several linear inequalities for the eigenvalues of a sum 


of Hermitian matrices can be transformed to those for the eigenvalues of 
block Hermitian matrices, and vice versa. 


III.3 Wielandt’s Minimax Principle 


The minimax principle (Corollary II.1.2) gives an extremal characterisa- 
tion for each eigenvalue a; of a Hermitian matrix A. Ky Fan’s maximum 
principle (Problem 1.6.15 and Exercise II.1.13) provides an extremal char- 
acterisation for the sum a; + --- +a, of the top k eigenvalues of A. In 
this section we will prove a deeper result due to Wielandt that subsumes 
both these principles by providing an extremal representation of any sum 
aj, +---+a;,. The proof involves a more elaborate dimension-counting for 
intersections of subspaces than was needed earlier. 

We will denote by V+-W the vector sum of two vector spaces V and W, by 
V — W any linear complement of a space W in V, and by 
span {v,,...,Uz} the linear span of vectors v1,..., Ux. 


Lemma III.3.1 Let W, > W2>---D Wy, be a decreasing chain of vector 
spaces with dimW; > k—j+1. Letw;,1 <j < k—1, be linearly independent 
vectors such that w; € W;, and let U be their linear span. Then there exists 
a nonzero vector u in W, —U such that the space U + span {u} has a basis 
U1,---,Uk with v; € W,,1 <y<k. 


Proof. ‘This will be proved by induction on k. The statement is easily 
verified when k = 2. Assume that it is true for a chain consisting of k — 1 
spaces. Let w1,...,Wx—1 be the given vectors and U their linear span. Let 
S be the linear span of w2,...,w,—1. Apply the induction hypothesis to the 
chain Wz D--- D> W, to pick up a vector v in W2 — S such that the space 
S+span{v} is equal to span{v2,...,ux} for some linearly independent vec- 
tors v; € W;,j = 2,...,k. This vector v may or may not be in the space 
U. We will consider the two possibilities. Suppose v € U. Then U = S+ 
span{v} because U is (k—1)-dimensional and S is (k—2)-dimensional. Since 
dim W, > k, there exists a nonzero vector u in W, —U. Then u, vo,..., Uz 
form a basis for U + span{u}. Put u = v;. All requirements are now met. 
Suppose v ¢ U. Then w,; ¢ S + span{v}, for if w; were a linear com- 
bination of we,...,W,—1 and v, then v would be a linear combination of 
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W1,W2,...,Wx—1 and hence be an element of U. So, span{wy, v2,..., Ux} is 
a k-dimensional space that must, therefore, be U + span{v}. Now w, € W, 
and v; € W;,7 = 2,...,k. Again all requirements are met. = 


Theorem III.3.2 Let Vi C Vo Cc --- C Vy be linear subspaces of an n- 
dimensional vector space V, with dim V; = 7;,1 <i, <t2 <-+- << ig <n. 
Let W, > W2>--- D Ws be subspaces of V, with dim W; =n —i; +1 
codim V; + 1. Then there exist linearly independent vectors v; € V;,1 
j <k, and linearly independent vectors w; € W;,1 <j <k, such that 


lA Il 


span{v},...,u,} =span{w),..., we}. 


Proof. When k = 1 the statement is obviously true. (We have used this 
repeatedly in the earlier sections.) The general case will be proved by in- 
duction on k. So, let us assume that the theorem has been proved for 
k — 1 pairs of subspaces. By the induction hypothesis choose v3; © V; and 
w; € W;,1 <7 < k—1, two sets of linearly independent vectors having the 
same linear span U. Note that U is a subspace of Vj. 

For j = 1,...,k, let S; =W; Vy. Then note that 


n > dim W; + dim V, — dim S; 
= (n— i; +1) +%, —dim S;. 
Hence, 


Note that 5; > Sp D--- D S, are subspaces of V, and w; € S; for j = 
1,2,...,k—1. Hence, by Lemma III.3.1 there exists a vector u in $,;—U such 
that the space U+span{u} has a basis u,,..., ux, where uj Es; Cw, j= 


1,2,...,k. But U + span{u} is also the linear span of v;,...,vz_1 and u. 
Put vu, = u. Then v; € V;,j = 1,2,...,k, and they span the same space as 
the U;- || 


Exercise ITI.3.3 If V is a Hilbert space, the vectors v; and w; in the 
statement of the above theorem can be chosen to be orthonormal. 


Proposition III.3.4 Let A be a Hermitian operator on H with e1genvec- 


tors u; belonging to eigenvalues dy (A),7 =1,2,...,n. 
(i) Let V; = span{u,...,uj;},1 <j <n. Given indices 1 <i, <+-- < 
tk <n, choose orthonormal vectors xi, from the spaces Vij,j = 1,...,k. 


Let V be the span of these vectors, and let Ay be the compression of A to 
the space V. Then 


d;(Ay) > A; (A) for j=1,...,k. 


(tt) Let W; = span{u;,...,Un},1 < 7 <n. Choose orthonormal vectors 
x;, from the spaces Wij;,J =1,...,k. Let W be the span of these vectors 
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and Ay the compression of A to W. Then 
M(Aw) SAL (A) for j=1,...,k. 


Proof. Let y,...,yx% be the eigenvectors of Ay belonging to its eigenval- 
ues MM (Ay),.-- , AL (Ay). Fix j,1 <j <k, and in the space V consider the 
spaces spanned by {Zi,,--., 24, } and {y;,..., yx}, respectively. The dimen- 
sions of these two spaces add up to k+1, while the space V is k-dimensional. 
Hence there exists a unit vector u in the intersection of these two spaces. 
For this vector we have 


A; (Ay) > (u, Ayu) = (u, Au) > Aj, (A). 
This proves (i). The statement (ii) has exactly the same proof. a 
Theorem ITI.3.5 (Wielandt’s Minimax Principle) Let A be a Hermitian 


operator on an n-dimensional space H. Then for any indices 1 <i, <+--< 
le <n we have 


k k 
) M(A) = max min ) (xj, Ax;) 
j MiC--CMy, GEM; 
j=1 dim Mj=t) 2, orthonormal j=1 
k 
= min max ) (x;,AZ;). 
N1D--DNz x 5 EN; 


dim Nj=n-ij+l 2. orthonormal j=l 

Proof. We will prove the first statement; the second has a similar proof. 
Let V;, =span{u,,...,ui,;}, where, as before, the u; are eigenvectors of A 
corresponding to dj (A). For any unit vector z in Y;,, (x, Ax) > dj, (A). So, 
if x; € Yj, are orthonormal vectors, then 


k k 
S (25, Ax;) > Sod; (A 
j=l 


j=l 
Since x; were quite arbitrary, we have 


k 


k 
min S (aj, Az;) > wes (A). 


ry pEVi, 
-, orthonormal j=l 


Hence, the desired result will be achieved if we prove that given any sub- 
spaces M, C--- C My, with dim M, = 1%, we can find orthonormal vectors 


z; € M, such that 
k 


S (xj, Az) < 9 rj, (A). 
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Let Nj = Wi, = span{v,,..-,Un},j = 1,2,...,k. These spaces were 
considered in Proposition III.3.4(ii). We have N; > No--- D Ny and 
dim. N; = n-—i,; +1. Hence, by Theorem III.3.2 and Exercise III.3.3 there 
exist orthonormal vectors x; € M, and orthonormal vectors y; € NV; such 
that 


span{z1,...,2,} =span{y,..., yx} = W, — say. 
By Proposition III.3.4 (ii), \;(Aw) < A;,(A) for j = 1,2,...,k. Hence, 


(t;,Ar;) = (xj, Awa;) = tr Aw 


k k 
=1 j=l 


J 


| 
M4 
Ot 
Ss 
= 
A 
M= 
oe 
= 


This is what we wanted to prove. a 


Exercise III.3.6 Note that 


k k 


SOAs, (A) = S°(ui,, Aui,). 


j=l j=l 


We have seen that the maximum in the first assertion of Theorem III.3.5 
ts attained when M; = Vi, = span{uy,...,ui,},j =1,...,k, and with this 
choice the minimum is attained for Ly = Ui;,J =1,...,k. Are there other 


choices of subspaces and vectors for which these extrema are attained? (See 
Exercise III.1.3.) 


Exercise III.3.7 Let [a,b] be an interval containing all ergenvalues of A 
and let P(t),...,t~) be any real valued function on [a,b] x --- x [a,b] that is 


monotone in each variable and permutation-invariant. Show that for each 
choice of indices 1 <1, <---<ip< n, 


b (rk (A)... Ab (A)) 


= max min ® (Ai(Aw),-.., AM(A ) 
Mic CMy, W=span{x},...,7,} 1( w)s ) rat w) ? 
dim Mj=1j 2, €M,,2; orthonormal 


where Aw is the compression of A to the space W. In Theorem III.3.5 we 
have proved the special case of this with O(ti,...,th) = tr +--+ tp. 


Ill.4 Lidskii’s Theorems 


One important application of Wielandt’s minimax principle is in proving a 
theorem of Lidskii giving a relationship between eigenvalues of Hermitian 
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matrices A,B and A+ B. This is quite like our derivation of some of the 
results in Section III.2 from those in Section III.1. 


Theorem ITI.4.1 Let A,B be Hermitian matrices. Then for any choice 
of indices 1 <i, < +--+ <ip <n, 


k 


k k 
ri, (A+ B) < Soat (A) + SOL (B). (III.10) 
j=l 


. j=1 


Proof. By Theorem III.3.5 there exist subspaces M, C --- C Mg, with 
dim M,; =i, such that 


k 


DN, (A+ B)= min S (23, (A+ B)z;). 


;- orthonormal j=1 


By Ky Fan’s maximum principle 


k k 
L 
> (tj, Bas) < S°d5(B), 
j=1 j=l 
for any choice of orthonormal vectors 71,...,2,. The above two relations 


imply that 


k k k 
l . 
> Xi, (A+B) < syeM, > } (25, Ax;) + > { £5, Bx5). 


x5 orthonormalJ=! 


Now, using Theorem III.3.5 once again, it can be concluded that the first 
k 


term on the right-hand side of the above inequality is dominated by Sod; (A 


j=l 


Corollary III.4.2 If A,B are Hermitian matrices, then the eigenvalues 
of A,B, and A+B satisfy the following majorisation relation 


(A + B) — \4(A) X ACB). (III.11) 


Exercise III.4.3 (Lidskii’s Theorem) The vector \+(A+ B) is in the con- 
ver hull of the vectors \!(A) + P\‘(B), where P varies over all permuta- 
tion matrices. [This statement and those of Theorem III.4.1 and Corollary 
III.4.2 are, in fact, equivalent to each other./ 


Lidskii’s Theorem can be proved without calling upon the more intricate 
Wielandt’s principle. We will see several other proofs in this book, each 
highlighting a different viewpoint. The second proof given below is in the 
spirit of other results of this chapter. 
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Lidskii’s Theorem (second proof). We will prove Theorem III.4.1 by 
induction on the dimension n. Its statement is trivial when n = 1. Assume 
it is true up to dimension n — 1. When k = n, the inequality (III.10) needs 
no proof. So we may assume that k < n. 

Let u;,v;, and w; be the eigenvectors of A,B, and A+ B corresponding 
to their eigenvalues d;(A), d;(B), and dj(A + B). We will consider three 
cases separately. 


Case 1.1, <n. Let M = span{w}j,...,Wn—1i} and let Ay, be the compres- 
sion of A to the space M. Then, by the induction hypothesis 


k k k 
So, (Am + Bu) < SOAL (Am) + SoA} (Bu). 
j=l j=1 q=1 


The inequality (III.10) follows from this by using the interlacing principle 
(III.2) and Exercise III.1.7. 


Case 2.1 <%,. Let M = span{ug,..., Un}. By the induction hypothesis 


k k k 
>, -1(Am + Bu) < SOAE (Am) + S/d} (Bm). 
j=l 


j=l j=l 
Once again, the inequality (III.10) follows from this by using the interlacing 
principle and Exercise III.1.7. 


Case 3. 1; = 1. Given the indices 1 = i, < in < +--+ < tk <n, pick up the 
indices 1 < €; < bg <... < €,_4% <n such that the set {i; :1<j<k} 
is the complement of the set {n — é;+1:1l<j<n- k} in the set 
{1,2,...,n}. These new indices now come under Case 1. Use (IIT.10) for 
this set of indices, but for matrices —A and —B in place of A, B. Then note 
that A; (—A) = — dj, -541(A) for all 1 < j <n. This gives 


n—k n—k n—k 

l 
DL Ane 41(A + B) < SO - ab, (4) + SO = at (By. 
j=l j=l j=l 


Now add tr(A + B) to both sides of the above inequality to get 


k k k 
SoA} (A+B) < dor, (A) + SOA (B). 
j=1 j=l j=l 
This proves the theorem. a 


As in Section ITI.2, it is useful to interpret the above results as pertur- 
bation theorems. The following statement for Hermitian matrices A, B can 
be derived from (III.11) by changing variables: 


A*(A) — A1(B) ~ A(A — B) ~ A! (A) — \1(B). (III.12) 
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This can also be written as 
\1(A) + AT(B) = A(A + B) = AY(A) + A! (B). (III.13) 


In fact, the two right-hand majorisations are consequences of the weaker 
maximum principle of Ky Fan. 
As a consequence of (III.12) we have: 


Theorem III.4.4 Let A,B be Hermitian matrices and let ® be any sym- 
metric gauge function on R”. Then 


& (A‘(A) — A!(B)) < ®(A(A — B)) < © (A4(A) - \'(B)). 


Note that Weyl’s perturbation theorem (Corollary III.2.6) and the in- 
equality in Exercise III.2.7 are very special cases of this theorem. 

The majorisations in (III.13) are significant generalisations of those in 
(II.35), which follow from these by restricting A, B to be diagonal matrices. 
Such “noncommutative” extensions exist for some other results; they are 
harder to prove. Some are given in this section; many more will occur later. 

It is convenient to adopt the following notational shorthand. If x, y, z are 
n-vectors with nonnegative coordinates, we will write 


k 


k 
log x <y log y_ if [35 < [[y;; fork =1,...,n; (III.14) 
j=l j=l 


log x<logy if log x <x, log y and [|= = [[9;: (IIT.15) 


j=l j=l 
k k k 
log x — log z <x, log y_ if | [=;, < [lu ][,. (IIT.16) 
j=l j=1 j=l 
for all indices 1 < 23 < --- < 2% < n. Note that we are allowing the 


possibility of zero coordinates in this notation. 


Theorem III.4.5 (Gel’fand-Naimark) Let A,B be any two operators on 
H. Then the singular values of A,B and AB satisfy the majorisation 


log s(AB) — log s(B) ~ log s(A). (III.17) 


Proof. We will use the result of Exercise IJI.3.7. Fix any index k,1 < 
k <n. Choose any k orthonormal vectors 71,...,2,, and let W be their 
linear span. Let ®(f1,...,¢%) = tite---t,. Express AB in its polar form 
AB = UP. Then, denoting by Ty the compression of an operator T' to the 
subspace W, we have 

® (Ai (Py),---,Az(Pw)) = |det Pwl? 

= |det((x:, Pwa;))|? 

| det((x;, Par;))|? 
| det((A*Ux;, Bx;))|?. 
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Using Exercise I.5.7 we see that this is dominated by 
det ((A*Uz;, A*Uz;)) det ((Bz;, Bx;)). 
The second of these determinants is equal to det(B* B)w; the first is equal 
k 


to det(AA*)yy and by Corollary III.1.5 is dominated by [[s3(4). Hence, 
j=l 
we have , 


® (Ai(Pw),---Az(Pw)) < det(B*B)w | [s7(A) 


j=l 


k 
= © (Ai (|BiFy),---, Ax(|BI2,)) []sj). 


Now, using Exercise III.3.7, we can conclude that 


k k 
(1],)? s TD, 037) T1534), 


k k 
[[s:,(4B) < []s:,(2)]][5;(4), (III.18) 
: pat 
for alll < 1%) <... <4, <n. This, by definition, is what (III.17) says. a 


Remark. The statement 


k k k 
[[s;(48) < II s;(A) II s;(B), (III.19) 


which is a special case of (III.18), is easier to prove. It is just the statement 
|| A* (AB)|| < || A¥ Al] || A® BI]. If we temporarily introduce the notation 
s!(A) and s'(A) for the vectors whose coordinates are the singular values 
of A arranged in decreasing order and in increasing order, respectively, then 
the inequalities (II.18) and (II.19) can be combined to yield 


log s'(A) + log s!(B) ~< log s(AB) ~< log s'(A) + log s!(B) (IIT.20) 


for any two matrices A, B. In conformity with our notation this is a sym- 
bolic representation of the inequalities 


k k 


k k k 
[[s:(A)][sn,+1(8) < T]si(4B) < [si] [si 


for all 1 <4, < ++: <a, <n. It is illuminating to compare this with the 
statement (III.13) for eigenvalues of Hermitian matrices. 
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Corollary IIT.4.6 (Lidskii) Let A,B be two positive matrices. Then all 
eigenvalues of AB are nonnegative and 


log A!(A) + log '(B) < log (AB) ~ log A! (A) + log \(B). (111.21) 


Proof. It is enough to prove this when B is invertible, since every positive 
matrix is a limit of such matrices. For invertible B we can write 


AB= B-V2( Blea BM?) Bi? | 


Now B1/?A B/2 is positive; hence the matrix AB, which is similar to it, 
has nonnegative eigenvalues. Now, from (III.20) we obtain 


log \!(Al/?) + log AT( BY?) 
< log s(A‘/?B1/?) ~ log 4!(Al/?) + log A (B1/?).  (II1.22) 


But s*(Al/2B1/2) = \!(B1/2AB/2) — )!(AB). So, the majorisations 
(III.21) follow from (III.22). | 


III.5 Eigenvalues of Real Parts and 
Singular Values 


The Cartesian decomposition A = Re A+iIm4A of a matrix A associates 
with it two Hermitian matrices Re A = AA. and Im A = As. It is of 
interest to know relationships between the eigenvalues of these matrices, 
those of A, and the singular values of A. 

Weyl’s majorant theorem (Theorem II.3.6) provides one such relation- 
ship: 


log |A(A)| ~ log s(A). 


Some others, whose proofs are in the same spirit as others in this chapter, 
are given below. 


Proposition III.5.1 (Fan-Hoffman) For-every matriz A 
d; (Re A) <s,;(A) forall j=1,...,n. 


Proof. Let z; be eigenvectors of Re A belonging to its eigenvalues dj (Re A) 
and y,; eigenvectors of |A| belonging to its eigenvalues s;(A),1 <j <n. For 
each j consider the spaces span{x,,...,z;} and span{y;,.-.,Y¥n}- Their di- 
mensions add up to n +1, so they have a nonzero intersection. If x is a unit 
vector in their intersection then 


AK(Re A) < (a, (Re A)x) = Re(z, Az) 
|(x, Ax)| < ||Az]| 
= (x, A*Ax)'/? < 8;(A). 


I/\ 
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Exercise III.5.2 (i) Let A be the 2 x 2 matrix (5 6). Then s2(A) = 0, 


but Re A has two nonzero eigenvalues. Hence the vector |A(Re A)|! is not 
dominated by the vector s(A). 

(ii) However, note that |A(Re A)| <~ s(A) for every matriz A. (Use the 
triangle inequality for Ky Fan norms.) 


Proposition III.5.3 (Ky Fan) For every matrix A we have 
Re A(A) ~ A(Re A). 
Proof. Arrange the eigenvalues \;(A) in such a way that 
Re A; (A) > Re A2(A) > --- > Re A, (A). 


Let 21,...,%, be an orthonormal Schur-basis for A such that 2,;(A) 


—— 


= (xj, Az;). Then \;(A) = (xj, A*x,;). Let W = span{z1,...,2,}. Then 


k k 
> Re A; (A) S (25, (Re A)z;) = tr (Re A)w 


j=1 j=1 


k k 
> i((Re A)w) < Y7Aj(Re A). 


Exercise III.5.4 Give another proof of Proposition III.5.3 using Schur’s 
theorem (given in Exercise II. 1.12). 


Exercise III.5.5 Let X,Y be Hermitian matrices. Suppose that their eigen- 
values can be indered as X,;(X) and \;(Y),1 < 7 <n, in such a way 


that Aj(X) < A;(Y) for all j. Then there exists a unitary U such that 
X <U*YU. 


(ii) For every matric A there exists a unitary matrix U such that 
Re A < U*|AU. 


An interesting consequence of Proposition III.5.1 is the following version 
of the triangle inequality for the matrix absolute value: 


Theorem III.5.6 (R.C. Thompson) Let A, B be any two matrices. Then 
there exist unitary matrices U,V such that 


|A + B| < U|A|U* + VIBIV™. 


Proof. Let A+ B= W|A+ B| bea polar decomposition of A+ B. Then 
we can write . 


IA+ B] = W*(A+ B) = Re W*(A+ B) = Re W*A+Re W*B. 


Now use Exercise III.5.5(ii). = 
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Exercise ITI.5.7 (1) Find 2 x 2 matrices A,B such that the inequality 
|JA+ B| < |A|+|B| is false for them. 

(1) Find 2 x 2 matrices A,B for which there does not exist any unitary 
matriz U such that |A+ B| < U(|A|+ |B|)U*. 


IIIl.6 Problems 


Problem III.6.1. (The minimax principle for singular values) For 
any operator A on H we have 


8; (A) 


max min 
M:dim M=j zEM,||x||=1 


|| Az|| 

Niedim oir ee eX 2, |e 

forl<j<n. 

Problem III.6.2. Let A, B be any two operators. Then 
s;(AB) < ||Bl|s;(A), 


8j(AB) < ||Al|s;(B) 
forl<j<n. 


Problem III.6.3. For 7 = 0,1,...,n, let 
R;, = {T € L(A): rank T < 7}. 
Show that for 7 = 1,2,...,n, 
= A— TI. 
s,(A) = min ||A~T] 
Problem ITI.6.4. Show that if A is any operator and H is any operator 
of rank k, then 
8;(A) > sj4x(A+H), 79 =1,2,...,.n—k. 
Problem III.6.5. For any two operators A, B and any two indices 2, 7 such 
that i+ 7<n+1, we have 
si4j-1(A + B) < 8i(A) + 8;(B) 


Si+j-1(AB) < s;(A)s;(B). 
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Problem III.6.6. Show that for every operator A and for each k = 
1,2,...,n, we have 


k k 
S (35( (A) = ax) (yj, Ax;) ’ 
j=l j=l 


where the maximum is over all choices of orthonormal k-tuples 71,..., 2; 
and yi,..., yx. This can also be written as 


k k 
S °8;(A) = ax!) x;,UAz;)|, 
j=l j=l 


where the maximum is taken over all choices of unitary operators U and 


orthonormal k-tuples 21,...,x,. Note that for k = 1 this reduces to the 
statement 
WAIl= S2 |y, Ax)I. 
[z{|=|]y||=1 
For k = 1,2,...,n, the above extremal representations can be used to give 


another proof of the fact that the expressions || A||(,) = Ss; (A) are norms. 
(See Exercise II.1.15.) 


Problem III.6.7. Let A = (a;;) be a Hermitian matrix. For each i = 
1,...,7, let 
1/2 


Sail? 
jFt 
Show that each interval [a;; — r;,a;; + r;| contains at least one eigenvalue 


of A. 


Problem ITI.6.8. Let a, > ag > --- > ap, be the eigenvalues of a Her- 
mitian matrix A. We have seen that the n — | eigenvalues of any principal 
submatrix of A interlace with these numbers. If 6, > 6g > --- > by-y 
are the roots of the polynomial that is the derivative of the characteristic 
polynomial of A, then we have by Rolle’s Theorem 


a, > 0, > ag >--: > bp_1 > An. 


Show that for each j there exists a principal submatrix B of A for which 
a; = d;(B) = 6; and another principal submatrix C for which 6; = 
Az(C) > aj41. 


Problem III.6.9. Most of the results in this chapter gave descriptions 
of eigenvalues of a Hermitian operator in terms of the numbers (x, Ax) 
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when x varies over unit vectors. Sometimes in computational problems 
an “approximate” eigenvalue 4 and an “approximate” eigenvector x are 
already known. The number (z, Ax) can then be used to further refine this 
information. 

For a given unit vector z, let p = (x, Az),e = ||(A — p)a||. 

(i) Let (a,b) be an open interval that contains p but does not contain 
any eigenvalue of A. Show that 


(b — p)(p — a) < e”. 
(ii) Show that there exists an eigenvalue a of A such that |a — p| < e. 


Problem III.6.10. Let » and € be defined as in the above problem. Let 


(a,b) be an open interval that contains p and only one eigenvalue a of A. 
Then 

2 2 
<a< . 
pra PT ar, 
This is called the Kato-Temple inequality. Note that if p —a and b— p 


are much larger than ¢, then this improves the inequality in part (ii) of 
Problem III.6.9. 


p—- 


Problem III.6.11. Show that for every Hermitian matrix A 


k 
DAMA) 


k 
> As(A) 


for 1 < k <n, where the extrema are taken over k x n matrices U that 
satisfy UU* = I,, I, being the k x k identity matrix. Show that if A is 
positive, then 


max trU AU”, 
UU*=I, 


min trUAU* 
UU*=I, 


k 
Li4 max det UAU", 
oy J UU*t=!I;, 

I= 


k 

Tj) — min det UAU*. 
4 J UU*=I;, 

j= 


(See Problem [.6.15.) 


Problem III.6.12. Let A, B be any matrices. Then 


Tr 


S °s;(A)s5(B) = sup|tr UAV B| = sup Retr UAVB, 


U,V 
j=l 


78 III. Variational Principles for Eigenvalues 


where U,V vary over all unitary matrices. 


Problem ITI.6.13. (Perturbation theorem for singular values) Let 
A, B be any n x n matrices and let ® be any symmetric gauge function on 
IR”. Then 
® (s(A) — s(B)) Xy ®(s(A— B)). 
In particular, 
max |s;(A) — s;(B)| < ||A— Bl]. 
[Hint: See Theorem III.4.4 and Exercise II.1.15.] 


Problem ITI.6.14. For positive matrices A,B show that 
\'(A) - AT(B) ~ A(AB) ~ A!(A)- A4(B). 

For Hermitian matrices A, B show that 
(A+(A), AT(B)) < tr AB < (\(A), A¢(B)). 

(Compare these with (11.36) and (II.37).) 


Problem ITI.6.15. Let A,B be Hermitian matrices. Use the second part 
of Problem III.6.14 to show that 


Eig’ A — Eig’ Bl|2 < ||A — Bll2 < ||Eig! A — Big! Blo. 


Note the analogy between this and Theorem III.2.8. (In Chapter IV we 
will see that both these results are true for a whole family of norms called 


unitarily invariant norms. This more general result is a consequence of 
Theorem III.4.4.) 


IlI.7 Notes and References 


As pointed out in Exercise III.1.6, many of the results in Sections IIL.1 and 
III.2 could be derived from each other. Hence, it seems fair to say that 
the variational principles for eigenvalues originated with A.L. Cauchy’s 
interlacing theorem. A pertinent reference is Sur l’équation d L’aide de 
laquelle on détermine les inégalités séculaires des mouvements des planétes, 
1829, in A.L. Cauchy, Oeuvres Complétes (IIe Série), Volume 9, Gauthier- 
Villars. 

The minimax principle was first stated by E. Fischer, Uber Quadratische 
Formen mit reellen Koeffizienten, Monatsh. Math. Phys., 16 (1905) 234- 
249. The monotonicity principle and many of the results of Section III.2 
were proved by H. Weyl in Das asymptotische Verteilungsgesetz der Eigen- 
werte linearer partieller Differentialgleichungen, Math. Ann., 71 (1911)441- 
469. In a series of papers beginning with Uber die Figenwerte bei den Dif- 
ferentialgleichungen der mathematischen Physik, Math. Z., 7(1920) 1-57, 
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R. Courant exploited the full power of the minimax principle. Thus the 
principle is often described as the Courant-Fischer-Weyl principle. 

As the titles of these papers suggest, the variational principles for eigen- 
values were discovered in connections with problems of physics. One fa- 
mous work where many of these were used is The Theory of Sound by Lord 
Rayleigh, reprinted by Dover in 1945. The modern applied mathematics 
classic Methods of Mathematical Physics by R. Courant and D. Hilbert, 
Wiley, 1953, is replete with applications of variational principles. For a still 
more recent source, see M. Reed and B. Simon, Methods of Modern Math- 
ematical Physics, Volume 4, Academic Press, 1978. Of course, here most of 
the interest is in infinite-dimensional problems and consequently the results 
are much more complicated. The numerical analyst could turn to B.N. Par- 
lett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980, and to G.W. 
Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, 1990. 

The converse to the interlacing theorem given in Theorem III.1.9 was 
first proved in L. Mirsky, Matrices with prescribed characteristic roots and 
diagonal elements, J. London Math. Soc., 33 (1958) 14-21. We do not know 
whether the similar question for higher dimensional compressions has been 
answered. More precisely, let a1 > --- > an, and 8B; > --- > Br, be 
real numbers such that “a; = 4G;. What conditions must these num- 
bers satisfy so that there exists an orthogonal projection P of rank k such 
that the matrix A = diag (a1,...,@,) when compressed to range P has 
eigenvalues (,...,@, and when compressed to (range P)+ has eigenvalues 
Pr4oi,-+.,8n? (Theorem III.1.9 is the case k = n — 1.) 

Aronszajn’s inequality appeared in N. Aronszajn, Rayleigh-Ritz and 
A. Weinstein methods for approximation of eigenvalues. I. Operators in 
a Hilbert space, Proc. Nat. Acad. Sci. U.S.A., 34(1948) 474-480. The ele- 
gant proof of its equivalence to Weyl’s inequality is due to H.W. Wielandt, 
Topics in the Analytic Theory of Matrices, mimeographed lecture notes, 
University of Wisconsin, 1967. 

Theorem III.3.5 was proved in H.W. Wielandt, An extremum property 
of sums of eigenvalues, Proc. Amer. Math. Soc., 6 (1955) 106-110. The 
motivation for Wielandt was that he “did not succeed in completing the 
interesting sketch of a proof given by Lidskii” of the statement given in 
Exercise III.4.3. He noted that this is equivalent to what we have stated 
as Theorem IJI.4.1, and derived it from his new minimax principle. Inter- 
estingly, now several different proofs of Lidskii’s Theorem are known. The 
second proof given in Section ITI.4 is due to M.F. Smiley, Inequalities re- 
lated to Lidskii’s, Proc. Amer. Math. Soc., 19 (1968) 1029-1034. We will 
see some other proofs later. However, Theorem III.3.5 is more general, has 
several other applications, and has led to a lot of research. An account 
of the earlier work on these questions may be found in A.R. Amir-Moez, 
Extreme Properties of Linear Transformations and Geometry in Unitary 
Spaces, Texas Tech. University, 1968, from which our treatment of Sec- 
tion III.3 has been adapted. An attempt to extend these ideas to infinite 
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dimensions was made in R.C. Riddell, Minimaz problems on Grassmann 
manifolds, Advances in Math., 54 (1984) 107-199, where connections with 
differential geometry and some problems in quantum physics are also de- 
veloped. The tower of subspaces occurring in Theorem III.3.5 suggests a 
connection with Schubert calculus in algebraic geometry. This connection 
is yet to be fully understood. 

Lidskii’s Theorem has an interesting history. It appeared first in V.B. 
Lidskii, On the proper values of a sum and product of symmetric matrices, 
Dokl. Akad. Nauk SSSR, 75 (1950) 769-772. It seems that Lidskii provided 
an elementary (matrix analytic) proof of the result which F. Berezin and 
I.M. Gel’fand had proved by more advanced (Lie theoretic) techniques in 
connection with their work that appeared later in Some remarks on the 
theory of spherical functions on symmetric Riemannian manifolds, Trudi 
Moscow Math. Ob., 5 (1956) 311-351. As mentioned above, difficulties with 
this “elementary” proof led Wielandt to the discovery of his minimax prin- 
ciple. 

Among the several directions this work opened up, one led to the follow- 
ing question. What relations must three n-tuples of real numbers satisfy in 
order to be the eigenvalues of some Hermitian matrices A,B and A+ B? 
Necessary conditions are given by Theorem III.4.1. Many more were discov- 
ered by others. A. Horn, Eigenvalues of sums of Hermitian matrices, Pacific 
J. Math., 12(1962) 225-242, derived necessary and sufficient conditions in 
the above problem for the case n = 4, and wrote down a set of conditions 
which he conjectured would be necessary and sufficient for n > 4. In a short 
paper Spectral polyhedron of a sum of two Hermitian matrices, Functional 
Analysis and Appl., 10 (1982) 76-77, B.V. Lidskii has sketched a “proof” 
establishing Horn’s conjecture. This proof, however, needs a lot of details 
to be filled in; these have not yet been published by B.V. Lidskii (or anyone 
else). 

When should a theorem be considered to be proved? For an interesting 
discussion of this question, see S. Smale, The fundamental theorem of al- 
gebra and complexity theory, Bull. Amer. Math. Soc. (New Series), 4(1981) 
1-36. 

Theorem II.4.5 was proved in I.M. Gel’fand and M. N aimark, The rela- 
tion between the unitary representations of the complex unimodular group 
and tts unitary subgroup, Izv Akad. Nauk SSSR Ser. Mat. 14(1950) 239- 
260. Many of the questions concerning eigenvalues and singular values of 
sums and products were first framed in this paper. An excellent summary 
of these results can be found in A.S. Markus, The eigen-and singular val- 
ues of the sum and product of linear operators, Russian Math. Surveys, 19 
(1964) 92-120. 

The structure of inequalities like (III.10) and (III.18) was carefully anal- 
ysed in several papers by R.C. Thompson and his students. The asymmetric 
way in which A and B enter (III.10) is remedied by one of their inequalities, 
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which says 


k 


k k 
>i +p, —J (A+ B)< >i, (A) + >», (B) 
j=1 j=l j 


—] 


for any indices 1 < 7413 <---< i <nl<pm<-:-<p,< nm, such that 
th + pr —-k <n. A similar generalisation of (III.18) has also been proved. 
References to this work may be found in the book by Marshall and Olkin 
cited in Chapter II. 

Proposition III.5.1 is proved in K. Fan and A.J. Hoffman, Some metric 
inequalities in the space of matrices, Proc. Amer. Math. Soc., 6 (1955) 111- 
116. 

Results of Proposition III.5.3, Problems III.6.5, III.6.6, III.6.11, and 
III.6.12 were first proved by Ky Fan in several papers. References to these 
may be found in I.C. Gohberg and M.G. Krein, Introduction to the Theory 
of Linear Nonselfadjoint operators, American Math. Society, 1969, and in 
the Marshall-Olkin book cited earlier. 

The matrix triangle inequality (Theorem III.5.6) was proved in R.C. 
Thompson, Conver and concave functions of singular values of matrix 
sums, Pacific J. Math., 66 (1976) 285-290. An extension to infinite di- 
mensions was attempted in C. Akemann, J. Anderson, and G. Pedersen, 
Triangle inequalities in operator algebras, Linear and Multilinear Algebra, 
11(1982) 167-178. For operators A,B on an infinite-dimensional Hilbert 
space there exist isometries U,V such that 


IA+ B] <U\|A|U* +V|B\|V*. 
Also, for each ¢ > 0 there exist unitaries U,V such that 
|A+ B) <U\|A|U* +V|B|V* + el. 


It is not known whether the € part in the last statement is necessary. 

Refinements of the interlacing principle such as the one in Problem ITI.6.8 
have been obtained by several authors, including R.C. Thompson. See, for 
example, his paper Principal submatrices IT, Linear Algebra Appl., 1(1968) 
211-243. 

One may wonder whether there are interlacing theorem, for singular val- 
ues. There are, although they are a little different from the ones for eigen- 
values. This is best understood if we extend the definition of singular values 
to rectangular matrices. Let A be an m xn matrix. Let r = min(m,n). The 
r numbers that are the common eigenvalues of (A*.A)'/? and (AA*)?/? are 
called the singular values of A. (Sometimes a sequence of zeroes is added 
to make max(m,n) singular values in all.) Many of the results for singular 
values that we have proved can be carried over to this setting. See, e.g., 
the books by Horn and Johnson cited in Chapter I. 
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Let A be a rectangular matrix and let B be a matrix obtained by deleting 
any row or any column of A. Then the minimax principle can be used to 
prove that the singular values of A and B interlace. The reader should 
work this out, and see that when A is an n X n matrix and B a principal 
submatrix of order n — 1 then this gives 


si(A) > s,(B) > 83(A), 

S2(A) > s82(B) > s4(A), 
~n2(A) > Sn-2(B) > 8a(A), 
Sn-1(A) > Sn—1(B) = 0. 


For more such results, see R.C. Thompson, Principal submatrices [X, Linear 
Algebra and Appl., 5(1972) 1-12. 

Inequalities like the ones in Problems III.6.9 and III.6.10 are called “resid- 
ual bounds” in the numerical analysis literature. For more such results, see 
the book by Parlett cited above, and F. Chatelin, Spectral Approximation 
of Linear Operators, Academic Press, 1983. =Several refinements, exten- 
sions, and applications of these results in atomic physics are described in 
the book by Reed and Simon cited above. 

The results of Theorem III.4.4 and Problem III.6.13 were noted by 
L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. 
J. Math., Oxford Ser. (2), 11(1960) 50-59. This paper contains a lucid sur- 
vey of several related problems and has stimulated a lot of research. The 
inequalities in Problem ITI.6.15 were first stated in K. Lowner, Uber mono- 
tone Matrix functionen, Math. Z., 38 (1934) 177-216. 

Let A = UP be a polar decomposition of A. Weyl’s majorant theorem 
gives a relationship between the eigenvalues of A and those of P (the sin- 
gular values of A). A relation between the eigenvalues of A and those of U 
was proved by A. Horn and R. Steinberg, Eigenvalues of the unitary part 
of a matrix, Pacific J. Math., 9(1959) 541-550. This is in the form of a 
majorisation between the arguments of the eigenvalues: 


arg A(A) ~ arg A(U). 


A theorem, very much like Theorems III.4.1 and III.4.5 was proved by 
A. Nudel’man and P. Svarcman, The spectrum of a product of unitary ma- 
trices, Uspehi Mat. Nauk, 13 (1958) 111-117. Let A, B be unitary matrices. 
Label the eigenvalues of A, B, and AB ase’... e2- ett ,e*®n | and 
e*7,...,e", respectively, in such a way that 

27 >A, >-:->a,>0 


y] 


27 > By 


IV 


V 


27> “2 9n 2 0. 
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If ay + B, < 2m, then for any choice of indices 1 < i] <--- < i, <n we 
have 


k k k 
Sov; < S ai, + S 5; 
j=l j=1 j=l 


These inequalities can also be written in the form of a majorisation between 
n-vectors: 


y-a~ . 
For a generalisation in the same spirit as the one of inequalities (III.10) 


and (III.18) mentioned earlier, see R.C. Thompson, On the ezgenvalues 


of a product of unitary matrices, Linear and Multilinear Algebra, 2(1974) 
13-24. 


IV 


Symmetric Norms 


In this chapter we study norms on the space of matrices that are invariant 
under multiplication by unitaries. Their properties are closely linked to 
those of symmetric gauge functions on R”. We also study norms that are 
invariant under unitary conjugations. Some of the inequalities proved in 
earlier chapters lead to inequalities involving these norms. 


IV.1 Norms on C” 


Let us begin by considering the familiar p-norms frequently used in analysis. 


For a vector x = (%1,...,2%n) we define 
Itlp = (Do lea?)/?, 1<p<o, (IV.1) 
i=l 
tlloo = max. |ari|. (IV.2) 


For each 1 < p < on, ||z/|, defines a norm on C”. These are called the 


p-norms or the l,-norms. The notation (IV.2) is justified because of the 
fact that 


[2lloo = lim. |Ixlp (IV.3) 


Some of the pleasant properties of this family of norms are 


z\_> = || lz] ||p for alla € C”, (IV.4) 
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IZllp < llvllp if |e] < |yl, (IV.5) 
\Iz|_> = ||Pz||p forallaeC",PeES,. (IV.6) 
(Recall the notations: |z| = (|x1|,...,|zn|), and |x| < |y| if Iz5| < |y;| for 


1<j <n. S, is the set of permutation matrices.) A norm on C” is called 
gauge invariant or absolute if it satisfies the condition (IV.4), mono- 
tone if it satisfies (IV.5), and permutation invariant or symmetric if it 
satisfies (IV.6). The first two of these conditions turn out to be equivalent: 


Proposition IV.1.1 A norm on C” is gauge invariant uf and only if it is 
monotone. 


Proof. Monotonicity clearly implies gauge invariance. Conversely, if a 
norm ||: || is gauge invariant, then to show that it is monotone it is 
enough to show that ||z|| < ||y|| whenever z; = t;y,; for some real numbers 
0O<t; < 1,7 =1,2,...,n. Further, it suffices to consider the special case 
when all t; except one are equal to 1. But then 


(ya, - 7: bYk,- -: Yn) || 


_|[((i+t,  1-t l+t  1-t 1+t 1-t 
5) Y1 D) Yly--e, 9 Vk 9 Yer + ++3 Yn + 5 Yn 
1l+t 1—t 
STW Ya + SM Yes Yn 
= ||(y1,---5Yn)||- 


Example [V.1.2 Consider the following norms on R?: 
(1) ||x|] = || + |w2| + |x, — x9l. 
(t2) ||x\] = \xa| + |x. — 29]. 
(212) |x|] = 2[xa| + [xa]. 


The first of these 1s symmetric but not gauge invariant, the second is neither 
symmetric nor gauge invariant, while the third is not symmetric but is 
gauge invariant. 


Norms that are both symmetric and gauge invariant are especially inter- 
esting. Before studying more examples and properties of such norms, let us 
make a few remarks. 

Let T be the circle group; i.e., the multiplicative group of all complex 
numbers of modulus 1. Let S,oT be the semidirect product of S, and T. 
In other words, this is the group of all n x n matrices that have exactly one 
nonzero entry on each row and each column, and this nonzero entry has 
modulus 1. We will call such matrices complex permutation matrices. 
Then a norm || - || on C” is symmetric and gauge invariant if 


\|z|| = ||T'x|| for all complex permutations T. (IV.7) 
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In other words, the group of (linear) isometries for ||- || contains S,oT as a 
subgroup. (Linear isometries for a norm ||-|| are those linear transformations 
on C” that preserve || - ||.) 


Exercise IV.1.3 For the Euclidean norm ||z|\z = (S~|2x,|?)!/2 the group 
of isometries is the group of all unitary matrices, which 1s much larger than 
the compler permutation group. Show that for each of the norms ||z||1 and 
\|Z|loo the group of isometries is the complex permutation group. 


Note that gauge invariant norms on C” are determined by those on 
R”. Symmetric gauge invariant norms on R” are called symmetric gauge 
functions. We have come across them earlier (Example IT.3.13). To repeat, 
amap ®: R” — R, is called a symmetric gauge function if 


(i) ® is a norm, 
(ii) ®( Px) = ®(z) for alle € R" and PE S,, 
(iii) O(€1271,...,en Ln) = O(z1,..., 2) if é; = +1. 
In addition, we will always assume that ® is normalised, so that 
(iv) 6(1,0,...,0) =1. 


The conditions (ii) and (iii) can be expressed together by saying that ® 
is invariant under the group S,o Zs consisting of permutations and sign 
changes of the coordinates. Notice also that a symmetric gauge function is 
completely determined by its values on Rt. 


Example I'V.1.4 If the coordinates of x are arranged so that |x| > |x| > 


... 2 |£n|, then for each k = 1,2,...,n, the function 
k 
G,)(2) = S— |x, | (IV.8) 
j=l 


2s a symmetric gauge function. We will also use the notation W2llcny for 
these. The parentheses are used to distinguish these norms from the p- 
norms defined earlier. Indeed, note that Z[(a) = rlloo and ||z\|(n) = |lzII1. 


We have observed in Problem II.5.11 that these norms play a very distin- 
guished role: if ®(,)(z) < ®q)(y) for all k = 1,2,...,n, then ®(z) < O(y) 
for every symmetric gauge function ®. Thus an infinite family of norm 
inequalities follows from a finite one. 


Proposition IV.1.5 For each k = 1,2,...,n 


? 


®(x)(x) = min{G/,)(u) + k@q)(v) : c= ut vt}. (IV.9) 
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Proof. We may assume, without loss of generality, that x € RY. Ifa = 
u+v, then ®)(x) < Oc) (u) + Bay (v) < ®eny(u) + k®(1)(v). If we choose 


u= (x} —at,as —ai,...,at ~2t,0,...,0) 
L L yt 
v = (Typ. ++, Ly, Fh44,---, 04), 
then 
u+tuv = gt, 
O(n)(u) = Bg) (x) — kaj, 
®(1)(v) — Lj, 
and the proposition follows. a 


We now derive some basic inequalities. If f is a convex function on an 
interval [ and if a;,2 = 1,2,...,n, are nonnegative real numbers such that 


Sai = 1, then 
i=1 


F(S<aits) < S"asf(ts) for all t; € I. 
t=1 t=1 


Applying this to the function f(t) = —logt on the interval (0,00), one 
obtains the fundamental inequality 


Tle: < Sait if t; = 0, a; = 0,5 a; =]. (IV.10) 
i=1 i=1 


This is called the (weighted) arithmetic-geometric mean 
inequality. The special choice aj = ag = --- = Gn = I gives the usual 
arithmetic - geometric mean inequality 


nr 


T]ty/" < Sti if t>0. (IV.11) 
i=l 


t= 1 


Theorem IV.1.6 Let p,q be real numbers with p > 1 and Fs + ; =]. Let 
x,y € R”. Then for every symmetric gauge function ® 


(|x - yl) < [B(\a|)]'/?[@(1yl?)]""7. (IV.12) 


Proof. From the inequality (IV.10) one obtains 
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and hence 1 
D 
For t > 0, if we replace x,y by tx and t~'y, then the left-hand side of 
([V.13) does not change. Hence, 


B(|e yl) < H([2)”) + -B((y!?). (IV.13) 


tP 1 
< min | — P — ®(|y|7%)] . IV.14 
P([z- yl) S min | (lal) + (yl?) (IV.14) 
But, if 
na 8 i+) where t,a,b > 0 
p( ) ~~ p- gt? ’ 9%) ) 


then plain differentiation shows that 
min y(t) = a/Pb'/9, 
So, (IV.12) follows from (IV.14). a 


When © = ,,), ([V.12) reduces to the familiar Hélder inequality 


Tt nr Tm 
D_ leeval SCD bared) 7°93 Iya )/2. 
i=1 i=l i=l 


We will refer to ([V.12) as the Hélder inequality for symmetric gauge 
functions. The special case p = 2 will be called the Cauchy-Schwarz 
inequality for symmetric gauge functions. 


Exercise IV.1.7 Let p,q,r be positive real numbers with : +2 = z, Show 
that for every symmetric gauge function ® we have 
[B(|x-yl")]/7 < [@(\xfP)]!/?[B({y|2)]2/2. (IV.15) 
Theorem IV.1.8 Let ® be any symmetric gauge function and let p > 1. 
Then for all x,y € R” 
[D(|x + ylP)}/? < [B(|x|P)]/? + [B(lylPy}L/”. (IV.16) 


Proof. When p = 1, the inequality (IV.16) is a consequence of the triangle 
inequalities for the absolute value on R” and for the norm ®. Let p>. 


It is enough to consider the case x > 0, y > 0. Make this assumption and 
write 


(r+ y)PSa- (ety) +y- (ety) 
Now, using the triangle inequality for ® and Theorem IV.1.6, one obtains 
O((z + y)”) O(x-(n+y)P") + Oly: (a +y)?7) 
[D(x?)}°/? [O(a + y)WP-Y) 4/4 
[B(yP)}'/?[@((w + 2-9 
{[B(xP)]'/? + [B(y?)]/?}[B((a + y)?)]*/2, 


+ IA IA 
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since q(p — 1) = p. If we divide both sides of the above inequality by 
[B((x + y)?)|+/4, we get (IV.16). a 


Once again, when ® = ©,,,) the inequality (IV.16) reduces to the familiar 
Minkowski inequality. So, we will call (IV.16) the Minkowski inequality 
for symmetric gauge functions. 


Exercise IV.1.9 Let ® be a symmetric gauge function and let p > 1. Let 
&) (x) = [®(|x|?)]*/?. (IV.17) 


Show that ®) is also a symmetric gauge function. 
Note that, if ®, is the family of €,-norms, then 


pir) =®,,p, forall py,p2 > 1, (IV.18) 
and, tf ®(,) 1s the norm defined by (IV.8), then 


k 
BE) (@) = (Do lel?) (IV.19) 
j=1 


where the coordinates of x are arranged as |x1| > |xq| >--- > \Zn|- 


Just as among the J,-norms, the Euclidean norm has especially interest- 
ing properties, the norms ®{?) where ® is any symmetric gauge function 
have some special interest. We will give these norms a name: 


Definition IV.1.10 W is called a quadratic symmetric gauge func- 


tion, or a Q-norm, if V = ©) for some symmetric gauge function ®. In 
other words, 


W(2) = [®(\x[*))/?, (IV.20) 
Exercise IV.1.11 (i) Show that an l,-norm is a Q-norm if and only if 
p22. 


(it) More generally, show that for each k = 1,2,...,n, or 1s a Q-norm 
if and only if p > 2. 
Exercise IV.1.12 We saw earlier that if ®(,)(x) < ®(x)(y) for all k = 
1,2,...,n, then ®(x) < ®(y) for all symmetric gauge functions. Show that 
of BG) (2) < OG) (y) for allk = 1,2,...,n, then ®2 (x) < ®(y) for all 
symmetric gauge functions ®; i.e., U(x) < U(y) for all quadratic symmetric 
gauge functions. 


If ® is a norm on C”, the dual of ©® is defined as 
$'(x) = sup |(z,y)|- (IV.21) 
P(y)=1 
It is easy to see that ®’ is a norm. (In fact, 6’ is a norm even when ® is 


a function on C” that does not necessarily satisfy the triangle inequality 
that but meets the other requirements of 3 norm.) 
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Exercise IV.1.13 If ® is a symmetric gauge function then so is ®’. 


Exercise IV.1.14 Show that for any norm ® 
(x, y)| < B(x)’ (y) for all x,y. (IV.22) 


Exercise IV.1.15 Let ®, be the l,-norm, 1 < p< ow. Show that 
1 1 
®@,=6,, where —~+-=1. (IV.23) 
Dp 4 


Exercise [V.1.16 Let ® and V be two norms such that 
®(x) <cW(x) for all x and for some c> 0. 


Show that 
®'(x) >c W(x) forall x. 


We shall call a symmetric gauge function a Q’-norm if it is the dual of 
a @-norm. The |,-norms for 1 < p < 2 are examples of Q’-norms. 


Exercise [V.1.17 (i) Let ® be a norm such that ® = ©’. Then ® must be 
the Euclidean norm. 


(11) Let ® be both a Q-norm and a Q’-norm. Then ® must be the Eu- 
clidean norm. (Use Exercise IV.1.16 and the fact that every symmetric 
gauge function is bounded by the l,-norm.) 


Exercise [V.1.18 For each k = 1,2,...,n, the dual of the norm D(x) ts 
given by 


1 
| ®(,) (x) = max { Pal) 7% Pin) ) . (IV.24) 
Prove this using Proposition IV.1.5 and Exercise IV.1.16. 


Some ways of generating symmetric gauge functions are described in the 
following exercises. 


Exercise IV.1.19 Let 1 =a, >a, >---> An = 0. Given a symmetric 
gauge function ® on R", define 


U(x) = ®(aj|z|},..., an|z|4). 
Then W is a symmetric gauge function. 


Exercise IV.1.20 (i) Let ® be a symmetric gauge function on R”. Let 
m<n. Ifx eR”, let = (21,...,2m,0,0,... ,0) and define U(r) = ®(Z). 
Then W is a symmetric gauge function on R™. 

(11) Conversely, given any symmetric gauge function V on R™, if for 


n>m we define ®(71,...,2,) = U(|z|t,..., |z|,), then ® is a symmetric 
gauge function on R”. 
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IV.2  Unitarily Invariant Norms on Operators 
on C” 


In this section, C” will always stand for the Hilbert space C” with inner 
product (-,-) and the associated norm ||-||. (No subscript will be attached to 
this “standard” norm as was done in the previous Section.) If A is a linear 
operator on C”, we will denote by ||A|| the operator (bound) norm of 
A defined as 


| Al] = Sup || Az}. (IV.25) 
r\j=1 
As before, we denote by |A| the positive operator (A*A)/? and by s(A) 


the vector whose coordinates are the singular values of A, arranged as 
8;(A) > s9(A) >--- > s,(A). We have 


Al] = | [AT |] = s1(A). (IV.26) 


Now, if U, V are unitary operators on C”, then |U AV| = V*|A|V and hence 


|All = ||U. AV || (IV.27) 


for all unitary operators U,V. This last property is called unitary invari- 
ance. Several other norms have this property. These are frequently useful 
in analysis, and we will study them in some detail. 


We will use the symbol ||| - ||| to mean a norm on n x n matrices that 
satisfies 


[UAV ||| = |All] (IV.28) 


for all A and for unitary U,V. We will call such a norm a unitarily in- 
variant norm on the space M(n) of n x n matrices. We will normalise 
such norms so that they all take the value 1 on the matrix diag(1,0,...,0). 

There is an intimate connection between these norms and symmetric 
gauge functions on R”; the link is provided by singular values. 


Theorem IV.2.1 Given a symmetric gauge function ® on R”, define a 
function on M(n) as 


I||Al|le = ®(s(A)). (IV.29) 


Then this defines a unitarily invariant norm on M(n). Conversely, given 
any unitarily invariant norm ||| - ||| on M(n), define a function on R” by 


B))).4)(z) = ||\diag(x)|||, (IV.30) 


where diag (x) is the diagonal matriz with entries x1,...,Xn on its diagonal. 
Then this defines a symmetric gauge function on R”. 


Proof. Since s(UAV) = s(A) for all unitary U,V, ||| - |||e is unitarily 
invariant. We will prove that it obeys the triangle inequality — the other 
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conditions for it to be a norm are easy to verify. For this, recall the majori- 
sation (II.18) 


s(A+ B) ~y s(A)+s(B) for all A,B € M(n), 
and then use the fact that ® is strongly isotone and monotone. (See Ex- 


ample II.3.13 and Problem II.5.11.) To prove the converse, note that (IV.30) 
clearly gives a norm on R”. Since diagonal matrices of the form 


diag(e*,..., e?") and permutation matrices are all unitary, this norm 
is absolute and permutation invariant, and hence it is a symmetric gauge 
function. | a 


Symmetric gauge functions on R” constructed in the preceding section 
thus lead to several examples of unitarily invariant norms on M(n). Two 
classes of such norms are specially important. The first is the class of 
Schatten p-norms defined as 


lAllp = ®p(s(A)) = [D_(85(A))?) 7”, l<p<o, (IV.31) 
Alloo = ®oo(s(A)) = 81(A) = |All. (IV .32) 


The second is the class of Ky Fan k-norms defined as 


k 
IAlla = 58(A), 1sk<n. (IV.33) 


j=l 


Among the p-norms, the ones for the values p = 1,2, 00, are used most often. 
As we have noted, || Allo. is the same as the operator norm || A|| and the Ky 
Fan norm ||Aj|(1). The norm ||A||; is the same as |All(n). This is equal to 


tr(|A]) and hence is called the trace norm, and is sometimes written as 
|| Aller. The norm 


nr 


Alle = [Do(s)(A) 7}? (IV.34) 


j=l 


is also called the Hilbert-Schmidt norm or the Frobenius norm (and 


is sometimes written as ||Al|- for that reason). It will play a basic role in 
our analysis. For A,B € M(n) let 


(A, B) = trA*B. (IV.35) 


This defines an inner product on M(n) and the norm associated with this 
inner product is || Allo, ice. 


} 


||All2 = (trA*A)!/?, (IV.36) 
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If the matrix A has entries a,;, then 


Allo = (do lai|?)/?. (IV.37) 


4,J 


Thus the norm ||All2 is the Euclidean norm of the matrix A when it is 


thought of as an element of C”’. This fact makes this norm easily com- 
putable and geometrically tractable. 


The main importance of the Ky Fan norms lies in the following: 


Theorem IV.2.2 (Fan Dominance Theorem) Let A,B be two n x n ma- 
trices. If 


All cx) < Bll ce) fork= 1,2,...,n, 
then 


IAI] < ||| Bl|| for all unitarily invariant norms. 


Proof. This is a consequence of the corresponding assertion about sym- 
metric gauge functions. (See Example IV.1.4.) a 


Since ®(1)(z) < ®(x) < ®,,)(x) for all c € R” and for all symmetric 
gauge functions ®, we have 


All < INAT S WAll@y = All (IV.38) 


for all A € M(n) and for all unitarily invariant norms ||| - |||. 
Analogous to Proposition [IV.1.5 we have 


Proposition IV.2.3 For each k = 1,2,...,n, 
Alley = min{||Bl|(n) + kI|Cl|: A = B+ C}. (IV .39) 


Proof. If A= B+C, then ||Allqa) <[IBlle) + [Cll < lIBllen + ICI 
Now let s(A) = (s1,.--,5n) and choose unitary U,V so that 


A = U((diag(si,...,5n)|V. 


Let 
B = Uldiag(s; — 8%, $2 — Sk,---, $k — Sk, 0,.-., 0)]V, 
C = Uldiag(sz, 8k,---5 Sk: Skt1)--+1 Sn)|V. 
Then 
A=B4C, 
k 


IBllny = > 87 — hse = llAlleey — B8e, 


j=l 
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|C| = Sk; 
and 
Alloy = Blin) + AICI. - 
A norm v on M(n) is called symmetric if for A, B,C in M(n) 
vy(BAC) < || Bl] v(A) |/C]]. (IV.40) 


Proposition IV.2.4 A norm. on M(n) is symmetric if and only if it is 
unitarily invariant. 


Proof. Ifvisa symmetric norm, then for unitary U,V we have p(U AV) < 

y(A) and v(A) = v(U-'UAVV—) < vD(UAV). So, v is unitarily invariant. 
Conversely, by Problem II.6.2, s;(BAC) < ||B|| ||C||s;(A) for all 7 
1,2,...,n. So, if ® is any symmetric gauge function, then ®(s(BAC)) 
| B|| ||C||®@(s(A)) and hence the norm associated with © is symmetric. 


lA ill 


In particular, this implies that every unitarily invariant norm is sub- 
multiplicative: | 


HABll| < [AMI IB] for all A, B. 


Inequalities for sums and products of singular values of matrices, when 
combined with inequalities for symmetric gauge functions proved in Section 


IV.1, lead to interesting statements about unitarily invariant norms. This 
is illustrated below. 


Theorem IV.2.5 Jf A,B aren xn matrices, then 
s’(AB) <y s"(A)s"(B) for allr > 0. (IV.41) 


Proof. If A*A is the kth antisymmetric tensor product of A, then 


k 
| A* Al] = s1(A*A) = ]] 5;(A), 1<k<n. 


j=l 


Hence, 


k 
]] s7(48) 


j=l 


| AY (AB)II" < (| A* All A* BUY” 


k 
= |[sf(4)si(B), 1<k<n 
j=l 


Now use the statement IT.3.5(vii). a 
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Corollary IV.2.6 (Hélder’s Inequality for Unitarily Invariant Norms) For 
every unitarily invariant norm and for all A,B € M(n) 


ABI] < |I} JAP UI/? [il Alt 2/9 (IV.42) 
for allp>1 and +2=1. 
Proof. Use the special case of (IV.41) for r = 1 to get 
® (s(AB)) < ® (s(A)s(B)) 


for every symmetric gauge function. Now use Theorem IV.1.6 and the fact 
that (s(A))? = s(|A|?). | 


Exercise IV.2.7 Let p,q,r be positive real numbers with 4 at = . Then 
for every unitarily invariant norm 


WIIABII/” < H LAIPIN/? 1] LB IaH 2/2. (IV.43) 
Choosing p = q = 1, one gets from this 
WABI? 1] < CAI IBHD?/. (IV.44) 


This is the Cauchy-Schwarz inequality for unitarily invariant norms. 


Exercise [V.2.8 Given a unitarily invariant norm |||-||| on M(n), define 
IAI = [I] APP? 1 <p <oo, (IV.45) 


Show that this is a unttarily invariant norm. Note that 


AZ? =||Allpip, for all pi,po >1 (IV.46) 

and , 
WANG) = (95 s8(A))/? for p>1l<k<n. (IV.47) 

j=1 


Definition IV.2.9 A unitarily invariant norm on M(n) is called a Q- 
norm if it corresponds to a quadratic symmetric gauge function; i.e., |||- ||| 
is a Q-norm if and only if there exists a unttarily invariant norm ||| - |||* 
such that 

IAL]? = |[A* All’. (IV.48) 


Note that the norm || ||, is a Q-norm if and only if p > 2 because 
|All, = A" Allpy2- (IV.49) 


The norms defined in (IV.47) are Q-norms if and only if p > 2. 
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Exercise IV.2.10 Let ||-||g denote a Q-norm. Observe that the following 
conditions are equivalent: 

(i) \|Alle < ||Blle for all Q-norms. 

(ii) |\|A* Al|| < |||B*B||| for all unitarily invariant norms. 


wae 2 
(iii) WANG < BIG) for k =1,2,...,n. 


(tv) (s(A))” <w (s(B))?. 


Duality in the space of unitarily invariant norms is defined via the inner 


product (IV.35). If ||| - ||| is a unitarily invariant norm, define ||| - |||‘ as 
JAI||’ = sup |{A,B)|= sup |trA* Bi. (IV.50) 
[|B] ||=1 [|B] ||=1 


It is easy to see that this defines a norm that is unitarily invariant. 


Proposition IV.2.11 Let ® be a symmetric gauge function on R” and 
let || - ||e be the corresponding unitarily invariant norm on M(n). Then 
Il lle = Il - Ile. 


Proof. We have from (II.40) and (IV.41) 
trA*B| < tr|A*B] = }° s;(A*B) < S~ 5;(A)s,(B). 
j=l j=l 


It follows that 
Alle < ®(s(A)) = ||Alle-. 


Conversely, 


Alla. 


| 

o 
om" 

Va) 
om" 

a 
—y 
VS 


(A)y;:y € R”, ®(y) =1 , 


| 

io 2) 

om 

oO 

—/ 
w 


= sup {tr[diag(s(A))diag(y)] : ||diag(y)||o = 1} 
<  ||diag(s(A))|lo = |All. . 


Exercise IV.2.12 From statements about duals proved in Section I V.1, 
we can now conclude that 


(1) |tr A*B| < ||All|- ||| BUI’ for every unitarily invariant norm. 
(ii) |All, = \|Alla for 1 <p <oo,4 42 =1. 


(vit) [|All = max{||Allay, dAllay}s 1S <n. 
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(tv) The only unitarily invariant norm that is its own dual is the Hilbert- 
Schmidt norm || - |lo. 


(v) The only norm that is a Q-norm and is also the dual of a Q-norm is 
the norm || - |lo. 


Duals of @-norms will be called Q’-norms. These include the norms 


An important property of all unitarily invariant norms is that they are all 
reduced by pinchings. If P,,...,P, are mutually orthogonal projections 
such that P; 6 Pp ®...® Py =I, then the operator on M(n) defined as 


k 
C(A) = SP, AP; (IV.51) 


is called a pinching operator. It is easy to see that 


ICCA) IIT < |All | (IV.52) 


for every unitarily invariant norm. (See Problem II.5.5.) We will call this 
the pinching inequality. 
Let us illustrate one use of this inequality. 


Theorem IV.2.13 Let A,B € M(n). Then for every unitarily invariant 
norm on M(2n) 


1 A+B 0 c A 0 c |A|+ |B] 0 
2 0 A+B — 0 B — 0 0 
( 


IV.53) 
Proof. The first inequality follows easily from the observation that 


A 0 B O os , 
| 0 B | and | 0 A | are unitarily equivalent. 


If we prove the second inequality in the special case when A, B are pos- 
itive, the general case follows easily. So, assume A, B are positive. Then 


A+B 0 AV2 Bie | [ Al? 0 
| 0 > f=1 0 0 BY? Q |’ 


where A!/?, B!/? are the positive square roots of A, B . Since T*T and TT* 
+B 0 


0 0 | is unitarily 


are unitarily equivalent for every 7’, the matrix | 
equivalent to 


Al/2 0 Al/2 Bi/2 A Al/2 Bl/2 
| BY? 0 | | 0 O | ~ | Bi? Ar/? B 
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But ‘ ” is a pinching of this last matrix. ™ 


As a corollary we have: 


Theorem IV.2.14 (Rotfel’d) Let f : Ri — Rx be a concave function with 
f(0) =0. Then the function F on M(n) defined by 


F(A) = Yo (si(A)) av. 


is subadditive. 


Proof. The second inequality in ([V.53) can be written as a majorisation 
in R2”: 
(s(A), s(B)) <w (s(|A| + |B]), 0) 
for all A, B € M(n). We also know that s(|A|+|B|) ~ s(A)+s(B). Hence 
(s(A), s(B)) < (s(A) + s(B), 0). 


Now proceed as in Problem II.5.12. a 


Exercise 1V.2.15 Let |||- ||| be a unttarily invariant norm on M(n). For 
m<nand A€ M(m), define 


wail= [Ifo Sl 


Show that ||| - |||! defines a unitarily invariant norm on M(m). 


We will use this idea of “dilating” A and of going from M(n) to M(2n) 
in later chapters. Procedures given in Exercises [V.1.19 and IV.1.20 can be 
adapted to matrices to generate unitarily invariant norms. 


IV.3  Lidskii’s Theorem (Third Proof) 


Let \/(A) denote the n-vector whose coordinates are the eigenvalues of a 
Hermitian matrix A arranged in decreasing order. Lidskii’s Theorem, for 
which we gave two proofs in Section III.4, says that if A, B are Hermitian 
matrices, then we have the majorisation 


\!(A) — \!(B) ~ A(A = B). (IV.55) 


We will give another proof of this theorem now, using the easier ideas of 
Sections III.1 and III.2. 
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Exercise [V.3.1 One corollary of Lidskii’s Theorem is that, if A and B 
are any two matrices, then 


|s(A) — s(B)| <, s(A — B). (IV.56) 


See Problem III.6.13. Conversely, show that if (IV.56 ) is known to be true 
for all matrices A,B, then we can derive from it the statement (IV.55). 
[Hint: Choose real numbers a, 3 such that A+ al >B+6I>0.] 


We will prove (IV.56) by a different argument. To prove this we need to 
prove that for each of the Ky Fan symmetric gauge functions Py), 1<k< 
n, we have the inequality 


® (4) (s(A) — 8(B)) < ®ey (s(A— B)). (IV.57) 
We will prove this for (1) and ©,,,), and then use the interpolation formulas 
(IV.9) and (IV.39). 


For ®(1) this is easy. By Weyl’s perturbation theorem (Corollary III.2.6) 
we have 


max|A;(A) — 3(B)| < ||A— Bl. 


This can be proved easily by another argument also. For any 7 consider 
the subspaces spanned by {u1,...,u;} and {v;,...,un}, where u;,v;,1 < 
2 < mare eigenvectors of A and B corresponding to their eigenvalues dM (A) 
and d(B), respectively. Since the dimensions of these two spaces add up 
to n+ 1, they have a nonzero intersection. For a unit vector z in this 
intersection we have (zr, Ar) > A;(A) and (z,Ba) < A; (B). Hence, we 
have 


||A — Bl| > |(z, (A — B)x)| > Aj(A) — A$(B). 
So, by symmetry 


|Aj(A) - Aj(B)| S$ ||A- Bll, l<g<n. 
From this, as before, we can get 


max|s;(A) ~ s;(B)| < ||4 ~ Bl| 


for any two matrices A and B. This is the same as saying 
®1) (s(A) — 5(B)) < Ba (s(A — B)). (IV.58) 


Let T’ be a Hermitian matrix with eigenvalues \; > A2z > --- > Ap > 
Apt1 2 ++: > An, where Ap > 0 > Ap+1. Choose a unitary matrix U such 
that 7’ = UDU*, where D is the diagonal matrix D = diag(\1,...,An)- 
Let Dt = (Aj,...,Ap,0,---,0) and D™ = (0,---,0,—Ap4i,---,;—An). Let 
Tt =UDtU*, T~- =UD~U*. Then both T* and T™~ are positive matri- 
ces and 

T=T't-T-. (IV.59) 


This is called the Jordan decomposition of 7’. 
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Lemma IV.3.2 If A,B aren xn Hermitian matrices, then 
S>|A}(A) = Aj(B)| < ||4 - Blleny- (IV.60) 
j=l 
Proof. Using the Jordan decomposition of A — B we can write 
|A — Blin) = tr(A — B)* + tr(A — B). 


If we put 
C=A+(A-—B) =B+(A-B)*, 


then C > A and C > B. Hence, by Weyl’s monotonicity principle, d5(C) > 
dy (A) and dj (C) > dj (B) for all 7. From these inequalities it follows that 


A}(A) — AH(B)| < A4(2C) - A(A) — A4(B). 


Hence, 


S-|At(A) — AE(B)| < tr(20 — A— B) = ||A — Bilin). 


j=l 7 
Corollary IV.3.3 For any two n x n matrices A,B we have 
®(n) (s(A) — s(B)) = S018;(A) — 8,(B)| < ||A- Bll. — (IV.61) 


j=l 
Theorem IV.3.4 For n xn matrices A,B we have the majorisation 
|s(A) — s(B)| <, s(A— B). 


Proof. Choose any index k = 1,2,...,n and fix it. By Proposition IV.2.3, 
there exist X,Y € M(n) such that 


A-B=xX4Y 
and | 
NA — Bilcay = [XI l(ny + ALY II. 

Define vectors a, 3 as 

a = s(X+B)-—s(B), 

B = s(A)—s(X4+B). 
Then | 

s(A) —s(B) =a+ 8. 
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Hence, by Proposition IV.1.5 (or Proposition IV.2.3 restricted to diagonal 
matrices) and by (IV.58) and (IV.61), we have 


P(x) (S(A) — 8(B)) < Gq) (a) +k Oa)(B) 
= (ny (s(X + B) — 8(B)) + k ®y (s(A) — s(X + B)) 


S [Ally + KIA — (X + B)| 
IX |[(ny + AIIY|| 
||A — Bla). 
This proves the theorem. a 


As we observed in Exercise IV.3.1, this theorem is equivalent to Lidskii’s 
Theorem. 


In Section IIT.2 we introduced the notation Eig A for a diagonal matrix 
whose diagonal entries are the eigenvalues of a matrix A. The majorisations 


\*(A) — \!(B) ~ (A — B) ~ A! (A) — AT(B) 
for the eigenvalues of Hermitian matrices lead to norm inequalities 
|||Big' (A) — Eig'(B)||| < |||A — Bil < |||Big!(A) — Eig'(B)|||,  (IV.62) 


for all unitarily invariant norms. This is just another way of expressing 
Theorem III.4.4. The inequalities of Theorem III.2.8 and Problem III.6.15 
are special cases of this. 


We will see several generalisations of this inequality and still other proofs 
of it. 


Exercise IV.3.5 If Sing!(A) denotes the diagonal matriz whose diagonal 


entries are 8;(A),...,Sn(A), then it follows from Theorem IV.3.4 that for 
any two matrices A, B 


|||Sing* (A) — Sing*(B)||| < ||| A — Bll 
for every unitarily invariant norm. Show that in this case the “opposite 


inequality” 
||| A — B]]| < |||Sing’(A) — Sing'(B)|| 


1s not always true. 
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Consider the following numbers associated with an n x n matrix: 


(i) |tr A] =| >) A;(ADI; 
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(ii) spr A = max |A;(A)|, the spectral radius of A; 
l<jgn 


(iii) w(A) = qmax, | (a, Az)|, the numerical radius of A. 

Of these, the first one is a seminorm but not a norm on M(n), the second 
one is not a seminorm, and the third one is a norm. (See Exercise [.2.10.) 

All three functions of a matrix described above have an important in- 
variance property: they do not change under unitary conjugations; i.e., 
the transformations A — UAU*,U unitary, do not change these functions. 
Indeed, the first two are invariant under the larger class of similarity 
transformations A — SAS~!,S invertible. The third one is not invari- 
ant under all such transformations. 


Exercise IV.4.1 Show that no norm on M(n) can be invariant under all 
similarity transformations. 


Unlike the norms that were studied in Section 2, none of the three func- 
tions mentioned above is invariant under all transformations A — UAV, 
where U,V vary over the unitary group U(n). 


We will call a norm 7 on M(n) weakly unitarily invariant (wui, for 
short) if 


7(A) =r(UAU*) for all A€M(n),U € U(n). (IV.63) 


Examples of such norms include the unitarily invariant norms and the 
numerical radius. Some more will be constructed now. 


Exercise IV.4.2 Let E, be the diagonal matrix with its top left entry 1 
and all other entries zero. Then 


w(A)= max |tr £,,UAU"|. (IV .64) 
UeU(n) 


Equivalently, 


w(A) = max{|tr AP|: P is an orthogonal projection of rank 1}. 
Given a matrix C, let 
wo(A)= max |trCUAU*|, Ae M(n). (IV.65) 
ueU(n) 
This is called the C-numerical radius of A. 


Exercise IV.4.3 For every C € M(n), the C-numerical radius wo is a 
wut seminorm on M(n). 


Proposition IV.4.4 The C-numerical radius wo is a norm on M(n) if 
and only if 
(1) C ts not a scalar multiple of I, and 


(ii) tr C £0. 


IV.4 Weakly Unitarily Invariant Norms 103 


Proof. If C = AI for any 4 € C, then we(A) = |A| |tr Al, and this is zero 
if tr A = 0. So we cannot be a norm. If tr C = 0, then wo(I) = |trC| = 0. 
Again wo is not a norm. Thus (i) and (ii) are necessary conditions for wo 
to be a norm. 

Conversely, suppose wo(A) = 0. If A were a scalar multiple of J, this 
would mean that tr C = 0. So, if tr C £ 0, then A is not a scalar multiple 
of J. Hence A has an eigenspace M of dimension m, for some 0 < m < n. 
Since e’* is a unitary matrix for all real t and skew-Hermitian K , the 
condition wc(A) = 0 implies in particular that 


trCe* Ae ** —=0 if te R K =—K*. 
Differentiating this relation at t = 0, one gets 
tr(AC—-CA)K=0 if K=-—kK"*. 
Hence, we also have 
tr (AC — CA)X =0 for all X € M(n). 


Hence AC = CA. (Recall that (S,T) = trS*T is an inner product on 
M(n).) Since C commutes with A, it leaves invariant the m-dimensional 
eigenspace M of A we mentioned earlier. Now, note that since wc(A) = 
wc(U AU*),C also commutes with UAU* for every U € U(n). But UAU* 
has the space UM as an eigenspace. So, C also leaves UM invariant for all 
U € U(n). But this would mean that C leaves all m-dimensional subspaces 
invariant, which in turn would mean C leaves all one-dimensional subspaces 
invariant, which is possible only if C is a scalar multiple of J. a 


More examples of wui norms are given in the following exercise. 


Exercise IV.4.5 = (z) 7(A) = ||A||+|tr A| ts a wut norm. More generally, 
the sum of any wut norm and a wut seminorm is a wui norm. 


(it) T(A) = max(||Al|, |tr A]) ts a wui norm. More generally, the mazi- 
mum. of any wut norm and a wui seminorm is a wut norm. 


(111) Let W(A) be the numerical range of A. Then its diameter diam W(A) 
is @ wut seminorm on M(n). It can be used to generate wut norms 


as in (t) and (i). Of particular interest would be the norm T(A) = 
w(A) + diam W(A). 


(iv) Let m(A) be any norm on M(n). Then 


T(A) = max m(UAU") 
ueU(n) 


1s a Wut NoTM. 
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(vu) Let m(A) be any norm on M(n). Then 


7(A) = how m(U AU*)dU, 


where the integral is with respect to the (normalised) Haar measure 
on U(n) is a wut norm. 


(vi) Let 
T(A) = max max|(e;, Ae;)|, 
Elyse, On %53 
where €1,...,€n varies over all orthonormal bases. Then T is a wut 


norm. How is this related to (11) and (iv) above? 
Let S be the unit sphere in C”, 
S= {reEC": |la|] = 1}, 


and let C'(S’) be the space of all complex valued continuous functions on 
S. Let dx denote the normalised Lebesgue measure on S. Consider the 
familiar L,-norms on C’'(S) defined as 


Np(f) =. llflle = ( [ [f(a)|Pdx)/?, 1<p<co, 
No(f) = HWflloo = max| f(z)]. (IV.66) 


Since the measure dz is invariant under rotations, the above norms satisfy 
the invariance property 


N,(foU) =N,(f) for all f € C(S),U € U(n). 
We will call a norm N on C(S) a unitarily invariant function norm if 
N(foU)=WN(f) for all f € C(S),U € U(n). (IV.67) 


The Lp-norms are important examples of such norms. 


Now, every A € M(n) induces, naturally, a function f4 on S by its 
quadratic form: 


fa(z) = (a, Ax). (IV.68) 


The correspondence A — fy is a linear map from M(n) into C(S), which 


is one-to-one. So, given a norm N on C(S), if we define a function N’ on 
M(n) as 


N'(A) = N(fa), (IV.69) 
then N’ is a norm on M(n). Further, 
N'(UAU*) = N(fuau~) = N(faoU*). 


So, if N is a unitarily invariant function norm on C(S) then N’ is a wui 
norm on M(n). The next theorem says that all wui norms arise in this way: 
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Theorem IV.4.6 A norm on M(n) is weakly unitarily invariant if and 
only if there exists a unitarily invariant function norm N on C(S) such 


that r = N’, where the map N — N’ is defined by relations (IV.68) and 
(IV.69). : 


Proof. We need to prove that every wui norm 7 on M(n) is of the form 
N' for some unitarily invariant function norm N. 

Let F = {fa : A € M(n)}. This is a finite-dimensional linear subspace 
of C(S). Given a wui norm 7, define No on F' by 


No(fa) =7(A). (IV.70) 


Then No defines a norm on F’,, and further, No(f oU) = No(f) for all 
f © F. We will extend No from F to all of C(S) to obtain a norm N that 
is unitarily invariant. Clearly, then 7 = N’. 

This extension is obtained by an application of the Hahn-Banach Theo- 
rem. The space C(S) is a Banach space with the supremum norm ||f||,0. 
The finite-dimensional subspace F has two norms No and || - ||... These 
must be equivalent: there exist constants 0 < a < ~B < oo such that 
Al|flloo < No(f) < Allfll.o for all f € F. Let G be the set of all linear 
functionals on F' that have norm less than or equal to 1 with respect to the 
norm Np; i.e., the linear functional g is in G if and only if |g(f)| < No(f) 
for all f € F. By duality then No(f) = sup|g(f)|, for every f € F. Now 

gEG 


la(f)| < Bilflle. for g € G and f € F. Hence, by the Hahn-Banach The- 
orem, each g can be extended to a linear functional g on C(S) such that 
I9(F)| < Bil flloo for all f € C(S). Now define 


O(f) =suplg(f)|, forall fe C(S). 


gEG 


Then @ is a seminorm on C(S) that coincides with No on F’.. Let 


w(f) = max {O(f),allfllo}, fe C(S). 


Then p is a norm on C(S), and p coincides with No on F’. Now define 


N(f)= sup u(foU), fects). 
ueU(n) 


Then N is a unitarily invariant function norm on C(S) that coincides with 
No on F.. The proof is complete. a 


When N = ||-||,. the norm N’ induced by the above procedure is the 
numerical radius w. Another example is discussed in the Notes. 

The C-numerical radii play a useful role in proving inequalities for wui 
norms: 
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Theorem IV.4.7 For A,B € M(n) the following statements are equiva- 
lent: 


(i) T(A) < T(B) for all wut norms rT. 


(it) wo(A) < wo(B) for all upper triangular matrices C' that are not 
scalars and have nonzero trace. 


(itt) wo(A) < we(B) for all C € M(n). 


(iv) A can be expressed as a finite sum A = )> z,U,BU, where U; € U(n) 
and z, are complez numbers with S~|z,| < 1. 


Proof. By Proposition IV.4.4, when C is not a scalar and tr C # 0, each 
woe is a wui norm. So (i) => (ii). 

Note that wo(A) = wa(C) for all pairs of matrices A, C. So, if (ii) is true, 
then wa(C) < wa(C) for all upper triangular nonscalar matrices C with 
nonzero trace. Since w, and wg are wui, and since every matrix is unitarily 
equivalent to an upper triangular matrix, this implies that w4(C) < wg(C) 
for all nonscalar matrices C’ with nonzero trace. But such C are dense in 
the space M(n). So wa(C) < wa(C) for all C € M(n). Hence (iii) is true. 

Let K be the convex hull of all matrices e*?U BU*,6 € R,U € U(n). Then 
K is a compact convex set in M(n). The statement (iv) is equivalent to 
saying that A € K. If A ¢ K, then by the Separating Hyperplane Theorem 
there exists a linear functional f on M(n) such that Re f(A) > Re f(X) 
for all X € K. For this linear functional f there exists a matrix C such 


that f(Y) = tr CY for all Y € M(n). (Problem IV.5.8) For these f and C 
we have 


we(A) = max |tr CUAU*| > |tr CA| = |f(A)| > Re f(A) 
ueU(n) 
> max Re f(X) 
= max Re tr Ce””U BU* 
6,U 
= max|tr CU BU*| 
= wo(B). 


So, if (iii) were true, then (iv) cannot be false. 
Clearly (iv) = (i). a 


The family wc of C-numerical radii, where C is not a scalar and has 
nonzero trace, thus plays a role analogous to that of the Ky Fan norms in 
the family of unitarily invariant norms. However, unlike the Ky Fan family 
on M(n), this family is infinite. It turns out that no finite subfamily of wui 
norms can play this role. 


IV.5 Problems 107 


More precisely, there does not exist any finite family 71,...,7m of wui 
norms on M(n) that would lead to the inequalities r(A) < T(B) for all wui 


norms whenever 7;(A) < 7;(B),1 < j < m. For if such a family existed, 
then we would have 


™m 
{X :7(X) <7(J) for all wui norms 7} = ( {x :7)(X) <7; (D)}. 
j=l 
(IV.71) 
Now each of the sets in this intersection contains 0 as an interior point 
(with respect to some fixed topology on M(n)). Hence the intersection also 
contains 0 as an interior point. However, by Theorem IV .4.7, the set on the 
left-hand side of (IV.71) reduces to the set {zI : z € C, |z| < 1}, and this 
set has an empty interior in M(n). 
Finally, note an important property of all wui norms: 


r(C(A)) < T(A) (IV.72) 


for all A € M(n) and all pinchings C on M(n). 


In Chapter 6 we will prove a generalisation of Lidskii’s inequality (IV.62) 
extending it to all wui norms. 


IV.5 Problems 


Problem IV.5.1. When 0 < p < 1, the function 6,(z) = (37 |x;|?)!/? 
does not define a norm. Show that in lieu of the triangle inequality we have 


b,(c+y) < 2°", (x) + Op(y)], O< p<. 


(Use the fact that f(t) = t? on Rx is subadditive when 0 < p < 1 and 
convex when p > 1.) 

Positive homogeneous functions that do not satisfy the triangle inequality 
but a weaker inequality y(z + y) < clp(x) + y(y)] for some constant c > 1 
are sometimes called quasi-norms. 


Problem IV.5.2. More generally, show that for any symmetric gauge 
function ® and 0 < p < 1, if we define 6) as in (IV.17), then 


GP (ety) <27-"'G (2) + (y)], O< p<. 


Problem IV.5.3. All norms on C” are equivalent in the sense that if ® and 
W are two norms, then there exists a constant K such that ®(r7) < KWV(z) 
for all x € C”. Let 


Kew = inf{K : ®(2) < KWV(z) for all zc}. 
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Find the constants Key when ®, V are both members of the family ®,. 


Problem IV.5.4. Show that for every norm ® on C” we have ®” = 9; 
i.e., the dual of the dual of a norm is the norm itself. 


Problem IV.5.5. Find the duals of the norms ®'?? 


(k) defined by (IV.19). 


(These are somewhat complicated.) 


Problem IV.5.6. For 0 < p < 1 and a unitarily invariant norm ||| - ||| on 
M(n), let 

IAI = [Ih LAIPII?/?. 
Show that 


A+ BUI < 2° [Aj + BI] 


Problem IV.5.7. Choosing p = q = 2 in (IV.43) or (IV.42), one obtains 
|ABII| < IA" All”? [I] B* BI I//7. 


This, like the inequality (IV.44), is also a form of the Cauchy-Schwarz in- 
equality, for unitarily invariant norms. Show that this is just the inequality 
(IV.44) restricted to Q-norms. 


Problem IV.5.8. Let f be any linear functional on M(n). Show that there 
exists a unique matrix X such that f(A) = tr XA for all A € M(n). 


Problem IV.5.9. Use Theorem IV.2.14 to show that for all A,B € M(n) 
det(1 + |A+ B|) < det(1 + |A]) det(1 + |B)). 


Problem IV.5.10. More generally, show that for 0 <p<1landu>0 
det(1 + u|A + BI?) < det(1 + p|A]?) det(1 + p|BI?). 


Problem IV.5.11. Let £, denote the space C” with the p-norm defined 


in (IV.1) and (IV.2), 1 < p < oo. For a matrix A let ||Al|p>—+p» denote the 
norm of A as a linear operator from £y to £57; i.e., 


Allpop = mex, IAzlpr. 
Show that 
Alo. = max > laisl- 


A} loo—00 


max ) ||. 
4 
j 


HAllicoo = maxlaj;|. 
0,7 
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None of these norms is weakly unitarily invariant. 


Problem IV.5.12. Show that there exists a weakly unitarily invariant 
norm 7 such that (A) 4 7(A*) for some A € M(n). 


Problem IV.5.13. Show that there exists a weakly unitarily invariant 
norm T such that +(A) > 7(B) for some positive matrices A, B with A < B. 


Problem IV.5.14. Let + be a wui norm on M(n). Define v on M(n) as 


v(A) = 7(|A\). Then v is a unitarily invariant norm if and only if 7(A) < 
T(B) whenever 0 < A < B. 


Problem IV.5.15. Show that for every wul norm 7 


(Eig A) = inf{r(SAS~*): § € GL(n)}. 


When is the infimum attained? 


Problem IV.5.16. Let + be a wui norm on M(n). Show that for every A 


tr A 
7(A) > [tr Al (1). 
n 
Use this to show that 


min{7(A — B):tr B=0}= Al op. 
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way ( 


IV.6 Notes and References 111 


For other values of p, the correspondence has not been worked out. 
For a recent survey of several results on invariant norms see C.-K. Li, 


Some aspects of the theory of norms, Linear Algebra Appl., 212/213 (1994) 
71-100. 


V 


Operator Monotone and Operator 
Convex Functions 


In this chapter we study an important and useful class of functions called 
operator monotone functions. These are real functions whose extensions 
to Hermitian matrices preserve order. Such functions have several special 
properties, some of which are studied in this chapter. They are closely 


related to properties of operator convex functions. We shall study both of 
these together. 


V.1 Definitions and Simple Examples 


Let f be a real function defined on an interval I. If D = diag(A1,..., An) 
is a diagonal matrix whose diagonal entries \; are in I, we define f(D) = 
diag(f(A1),---, f(An)). If A is a Hermitian matrix whose eigenvalues A; are 
in I, we choose a unitary U such that A = UDU*, where D is diagonal, 
and then define f(A) = Uf(D)U*. In this way we can define f(A) for all 
Hermitian matrices (of any order) whose eigenvalues are in J. In the rest of 
this chapter, it will always be assumed that our functions are real functions 
defined on an interval (finite or infinite, closed or open) and are extended 
to Hermitian matrices in this way. 

We will use the notation A < B to mean A and B are Hermitian and 
B — A is positive. The relation < is a partial order on Hermitian matrices. 

A function f is said to be matrix monotone of order n if it is mono- 
tone with respect to this order on n x n Hermitian matrices, i.e., if A< B 
implies f(A) < f(B). If f is matrix monotone of order n for all n we say f 
is matrix monotone or operator monotone. 
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A function f is said to be matrix convex of order n if for all n x n 
Hermitian matrices A and B and for all real numbers 0 < \ < 1, 


f((1—A)A+ AB) < (1—A)f(A) + Af (B). (V.1) 


If f is matrix convex of all orders, we say that f is matrix convex or 
operator convex. 

(Note that if the eigenvalues of A and B are all in an interval J, then 
the eigenvalues of any convex combination of A, B are also in I. This is an 
easy consequence of results in Chapter III.) 

We will consider continuous functions only. In this case, the condition 
(V.1) can be replaced by the more special condition 


(AFB) < AW +18) 


(V.2) 


(Functions satisfying (V.2) are called mid-point operator convex, and 
if they are continuous, then they are convex.) 

A function f is called operator concave if the function —f is operator 
convex. 

It is clear that the set of operator monotone functions and the set of 
operator convex functions are both closed under positive linear combina- 
tions and also under (pointwise) limits. In other words, if f,g are operator 
monotone, and if a, G are positive real numbers, then af + Gg is also oper- 
ator monotone. If f, are operator monotone, and if f,(x) — f(z), then f 
is also operator monotone. The same is true for operator convex functions. 


Example V.1.1 The function f(t) = a+ Bt is operator monotone (on 
every interval) for every a € R and B > 0. It is operator convex for all 
a,GeER. 


The first surprise is in the following example. 


Example V.1.2 The function f(t) = t? on (0,00) is not operator mono- 
tone. In other words, there exist positive matrices A,B such that B— A is 
positive but B? — A? is not. To see this, take 


1 1 2 |i 
A=(; i) B=() i): 
Example V.1.3 The function f(t) = t? is operator convex on every in- 
terval. To see this, note that for any Hermitian matrices A, B, 


A? + B? (A+B 
2 2 


2 
] 
) = {(A? + B® — AB ~ BA) = =(A-B)? > 0. 


This shows that the function f(t) =a+ 6t+ yt? is operator convex for all 
a,GER, y>0. 
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Example V.1.4 The function f(t) = t° on (0,00) is not operator conver. 
To see this, let 


Then, 


A? +B (A+B\* (61 
2 #\ 2 ~\ 1 0/7? 
and this is not positive. 


Examples V.1.2 and V.1.4 show that very simple functions which are 
monotone (convex) as real functions need not be operator monotone (op- 
erator convex). A complete description of operator monotone and operator 
convex functions will be given in later sections. It is instructive to study a 
few more examples first. The operator monotonicity or convexity of some 
functions can be proved by special arguments that are useful in other con- 
texts as well. 

We will repeatedly use two simple facts. If A is positive, then A < I if 
and only if spr(A) < 1. An operator A is a contraction (||A|| < 1) if and 
only if A*A < I. This is also equivalent to the condition AA* < I. 

The following elementary lemma is also used often. 


Lemma V.1.5 If B > A, then for every operator X we have X*BX > 
X*AX. 


Proof. For every vector u we have, 
(u, X*BXu) = (Xu, BXu) > (Xu, AXu) = (u, X*AXu). 


This proves the lemma. 


An equally brief proof goes as follows. Let C be the positive square root 
of the positive operator B — A. Then 


X*(B— A)X = X*CCX = (CX)*CX >0. 
| 


Proposition V.1.6 The function f(t) = —+ ts operator monotone on 
(0, co). 


Proof. Let B > A>0. Then, by Lemma V.1.5, I > B-1/2 AB-1/2, Since 
the map T — T7! is order-reversing on commuting positive operators, 
we have I < B'/*A~1B}/2. Again, using Lemma V.1.5 we get from this 


Bl < Ar}. _ 
Lemma V.1.7 If B > A>0 and B is invertible, then ||A!/2B-1/2\| < 1. 


Proof. IfB>A>0,thenI > B-'/?AB-1/2 — (41/2 B-1/2)* 4l/2 B-1/2, 
and hence ||A!/?B-1/2|| < 1. a 
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Proposition V.1.8 The function f(t) = t'/? is operator monotone on 
[(0, 00). 


Proof. Let B > A> 0. Suppose B is invertible. Then, by Lemma V.1.7, 
1> || Al/2 B-1/2 | > spr(A?/? B~1/2) _ spr(B~1/4 41/2 B-1/4)_ 


Since B-1/4AB~-1/4 is positive, this implies that J > Bu1/441/2B-1/4. 
Hence, by Lemma V.1.5, B'/? > A'/?. This proves the proposition under 
the assumption that B is invertible. If B is not strictly positive, then for 
every € > 0,B + el is strictly positive. So, (B + eJ)!/2 > A!/2, Let ¢ > 0. 
This shows that B1/2 > 41/2, | 


Theorem V.1.9 The function f(t) = t" is operator monotone on (0, co) 
forO<r<l. 


Proof. Let r be a dyadic rational, i-e., a number of the form r = an) 
where n is any positive integer and 1 < m < 2”. We will first prove the 
assertion for such r. This is done by induction on n. 
Proposition V.1.8 shows that the assertion of the theorem is true when 
n = 1. Suppose it is also true for all dyadic rationals aj, in which 1 < 
j<n-—1. Let B > A and let r = #. Suppose m < 2”~!. Then, by the 
induction hypothesis, B”/2""" > A™/2"~" Hence, by Proposition V.1.8, 
Br/2" > A™/2" | Suppose m > 2"-!. If B > A > 0, then A-! > B-?, 
Using Lemma V.1.5, we have B™/2" A-! Bm/2" > Bm/2" B-1 Bm/2” _ 
B(m/2"~"—1)_ By the same argument, 
A712 Bm/2” 4-1 pm/2” 4-1/2 Ani? Blm/2"-* 1) 4-1/2 


Avl/2 Alm/2"~*=1) 4-1/2 


IV IV 


(by the induction hypothesis). This can be written also as 
(A71/2 Bm/2” 4-1/2y2 > Alm/27~* —2) | 
So, by the operator monotonicity of the square root, 


Ao i/2 Bm/2" a-i/2 > Alm/2" 1) | 


Hence, B™/2" > A™/2", 

We have shown that B > A > 0 implies B” > A” for all dyadic rationals 
r in [0,1]. Such r are dense in [0,1]. So we have B” > A” for all r in [0,1]. 
By continuity this is true even when A is positive semidefinite . a 


Exercise V.1.10 Another proof of Theorem V.1.9 is outlined below. Fill 
an the details. 
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(i) The composition of two operator monotone functions 1s operator mono- 
tone. Use this and Proposition V.1.6 to prove that the function f(t) = 
t_ is operator monotone on (0,00). 


1+t 
(ii) For each X > 0, the function f(t) = x4; 1s operator monotone on 
(0, co). 


(iti) One of the integrals calculated by contour integration in Complex 
Analysis is 


or r—l 

/ A d\=acosecrm, O<r<l. (V.3) 
1+. 

0 


By a change of variables, obtain from this the formula 


_ sin ais 


t” tyr td V.4 
A+t (V4) 
valid for allt >O andO<r< il. 
(tv) Thus, we can write 
tv = d 
lx uA), O<r<l, (V.5) 
0 


where pt 1s a positive measure on (0, co). Now use (ii) to conclude that 
the function f(t) = t” ts operator monotone on (0,00) for0 <r <1. 


Example V.1.11 The function f(t) = |t| is not operator convex on any 
interval that contains 0. To see this, take 


(P41). (28) 
al=(_p >). laitisi=(_$ 7). 


But |A+B| = V2 I. So |A|+|B|—|A+B| is not positive. (See also Exercise 
TIT. 5.7.) 


Then 


Example V.1.12 The function f(t) = t V0 is not operator convex on 
any interval that contains 0. To see this, take A,B as in Example V.1.11. 
Since the eigenvalues of A are —2 and 0, f(A) =0. So $(f(A) + f(B)) = 


1 0 _: . , 
( 0 0 ). Any positive matrix dominated by this must have ( , ) as an 
ergenvector with 0 as the corresponding eigenvalue. Since (A + B) does 


0 . . 
not have ( 1 ) as an eigenvector, neither does f (442). 
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Exercise V.1.13 Let I be any interval. Fora € I, let f(t) = (t—a) V0. 
Then f is called an “angle function” angled at a. If I is a finite interval, 
then every convex function on I is a limit of positive linear combinations of 


linear functions and angle functions. Use this to show that angle functions 
are not operator convex. 


Exercise V.1.14 Show that the function f(t) = tVO is not operator mono- 
tone on any interval that contains 0. 


Exercise V.1.15 Let A, B be positive. Show that 


A+B € + By — (A7l— B71)(A71 4 Bo!)-1(471 — Bo) 
2 2 7 2 


Therefore, the function f(t) = + is operator convex on (0,00). 


V.2 Some Characterisations 


There are several different notions of averaging in the space of operators. In 
this section we study the relationship between some of these operations and 
operator convex functions. This leads to some characterisations of operator 
convex and operator monotone functions and to the interrelations between 
them. 

In the proofs that are to follow, we will frequently use properties of 
operators on the direct sum 1 @#H to draw conclusions about operators on 
H. This technique was outlined briefly in Section I.3. 

Let K be acontraction on H. Let L = (I-KK*)\/?, M = (I-K* K)'/?. 
Then the operators U,V defined as 


u=( i _«e) v=( x) (V.6) 


are unitary operators on H @® H. (See Exercise 1.3.6.) More specially, for 
each 0 < A <1, the operator 


\/2T —(1—A)'/?I 
W=( aor yar 7) 
is a unitary operator on H ® H. 


Theorem V.2.1 Let f be a real function on an interval I. Then the fol- 
lowing two statements are equivalent: 


(i) f is operator convex on I. 


(ii) f(C(A)) < C(f(A)) for every Hermitian operator A (on a Hilbert 
space H) whose spectrum is contained in I and for every pinching C 
(in the space 71). 
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Proof. (i) =>(ii): Every pinching is a product of pinchings by two comple- 
mentary projections. (See Problems II.5.4 and II.5.5.) So we need to prove 
this implication only for pinchings C of the form 


*X I 0 
o(x) = FI AS where u=( 4 od: 


For such aC 


f(C(A)) = p (AAD) < SA) 0" AU) 
= ROE = csi) 


(ii) > (i): Let A,B be Hermitian operators on H, both having their 
A 0 ; 
0 B on HOH. If W is 
the unitary operator defined in (V.7), then the diagonal entries of W*TW 
are AA + (1 — A)B and (1— A)A+ AB. So if C is the pinching on H @H 
induced by the projections onto the two summands, then 


my — ( AA+(1—-A)B 0 
cw TW) = ( 0 a aataB ) 


spectrum in J. Consider the operator T' = 


By the same argument, 


Cf(W"TW)) = C(W*f(T)W) 
( AF(A) + (1 — A) f(B) 0 ) 
0 (1—A) f(A) + Af(B) 7° 


So the condition f(C(W*TW)) < C(f(W*TW)) implies that 


f(A + (L— )B) < Af(A) + (1— d)f(B). 


Exercise V.2.2 The following conditions are equivalent: 
(i) f ts operator convex on I. 


(tt) f(Am) < (f(A))m for every Hermitian operator A with its spectrum 
in I, and for every compression T + Ty. 


(ttt) f(V*AV) < V*f(A)V for every Hermitian operator A (on H) with 


tts spectrum in I, and for every isometry from any Hilbert space 
onto H. 


(See Section III.1 for the definition of a compression. ) 
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Theorem V.2.3 Let I be an interval containing 0 and let f be a real 
function on I. Then the following conditions are equivalent: 


(i) f ts operator conver on I and f(0) <0. 


(i) f(K*AK) < K*f(A)K for every contraction K and every Hermitian 
operator A with spectrum in I. 


(wi) f(K{AK, + K3BKe2) < KY} f(A)Ki + Kéf(B)Ko for all operators 
Ky, Ke such that Ki} K,+ Kj K2 <I and for all Hermitian A, B with 
spectrum in I. 


(iv) f(PAP) < Pf(A)P for all projections P and Hermitian operators A 
with spectrum in I. 


Proof. (i) = (ii): Let T = (4 2) and let U,V be the unitary operators 
defined in (V.6). Then 


_f{ K*AK K*AL _[{ K*AK —-K*AL 
Uru = ( LAK LAL ). virV =( -LAK LAL ). 
So, 
K* AK 0 — USTU+V*TV 
0 LAL } © 2 
Hence, 
f(K*AK) 0 
0 f(LAL) 
U*TU + V*TV 
2 
f(U*TU) + f(V*TV) 
2 
U*f(Q)U+ V*f(L)V 
2 


_ 5 {u( a) 510) Ju+v ( a 00) )v} 
ur ( 5 ; U4 
( K* f(A)K 0 ) 
0 Lf(A)L }- 
Hence, f(K*AK) < K*f(A)K. 
Gi) = (i): Let T= (9 py K=(% 5) Then K is a con 
traction. Note that 


IA 


k’TK = ( KiAK, + KgBK2 0 ) 


0 0 
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Hence, 


f(KXAK, + KEBKy) 0 ) 
( f(0) 


| 


f(K°TK) < K*f(T)K 


( KIA + KSB) >) 
0 0 /- 


(iii) => (iv) obviously. 
(iv) (i): Let A, B be Hermitian operators with spectrum in J and let 


os AciLeT=(4 3) p=(4 j,) and let W be the unitary 


operator defined by (V.7). Then 


PW*TWP = ( AA+(1-A)B 0 ). 


0 0 
So, 


fAA+(1-A)B) 0 7 . 
( 5 7(0) ) = f(PW*TWP) 
< Pf(W*TW)P = PW*f(T)WP 


( (A) +(1—2)F(B) 0 ). 


Hence, f is operator convex and f(0) < 0. a 


Exercise V.2.4 (1) Let \1,A2 be positive real numbers such that 1A. > 
C*C. Then ( Au Ny ) is positive. (Use Proposition I.8.5.) 
2 


(it) Let . S be a Hermitian operator. Then for every « > 0, 


there exists \ > 0 such that 
A C* c A+el 0 
C B — 0 ALT }- 


The next two theorems are among the several results that describe the 
connections between operator convexity and operator monotonicity. 


Theorem V.2.5 Let f be a (continuous) function mapping the positive 
half-line [0, 00) into itself. Then f is operator monotone of and only if it ts 
operator concave. 


Proof. Suppose f is operator monotone. If we show that f(K*AK )> 
K* f(A)K for every positive operator A and contraction K, then it would 
follow from Theorem V.2.3 that f is operator concave. Let T = (4 >) and 


let U be the unitary operator defined in (V.6). Then U*TU = (A aK RAL ). 
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By the assertion in Exercise V.2.4(ii), given any ¢ > 0, there exists \ > 0 


such that 
k*AK+e 0 
< 
U*TU < ( ; MI ) 


Replacing T by f(T), we get 


( K*f(A)K K*f(A)L 


= U*f(T)U = f(U*TU) 


( [(K*AK +6) roo ) 


by the operator monotonicity of f. In particular, this shows K* f (A)K < 
f(K*AK + €) for every « > 0. Hence K*f(A)K < f(K* AK). 

Conversely, suppose f is operator concave. Let 0 < A < B. Then for any 
0<A< 1 wecan write 


Lf(A)K Lf (A)L 


IA 


\B = A+(1— 4 (B — A). 


Since f is operator concave, this gives 
FAB) > 44) +(1—a)t (248-4). 


Since f(X) is positive for every positive X, it follows that f(AB) > Af(A). 
Now let A — 1. This shows f(B) > f(A). So f is operator monotone. | 


Corollary V.2.6 Let f be a continuous function from (0,00) into itself. 
If f 1s operator monotone then the function g(t) = WG) 1s operator conver. 


Proof. Let A,B be positive operators. Since f is operator concave, 
f (442) > TA KE) Since the map X — X7! is order-reversing and 
convex on positive operators (see Proposition V.1.6 and Exercise V.1.15), 
this gives 


f (4 + ) -_ Ee + 12) ee FA + f(B)* 


2 2 ~ 2 
This is the same as saying g is operator convex. a 


Exercise V.2.7 Let I be an interval containing 0, and let f be a real 
function on I with f(0) < 0. Show that for every Hermitian operator A 
with spectrum in I and for every projection P 


f(PAP) < Pf(PAP) = Pf(PAP)P. 
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Exercise V.2.8 Let f be a continuous real function on |0,00). Then for 
all positive operators A and projections P 


f(Al/? PAM?) Al/2 Pp — Al/2 Pf(PAP). 


(Prove this first, by induction, for f(t) = t”. Then use the Weierstrass 
approximation theorem to show that this is true for all f.) 


Theorem V.2.9 Let f be a (continuous) real function on the interval 
[(0,a). Then the following two conditions are equivalent: 

(i) f is operator conver and f(0) < 0. 

(it) The function g(t) = f(t)/t is operator monotone on (0,a). 


Proof. (i) > (ii): Let 0 < A < B. Then 0 < Al/? < B/2. Hence, 
B-/2 Al/2 is a contraction by Lemma V.1.7. Therefore, using Theorem 
V.2.3 we see that 


f(A) _ f(A? Bo? BB-1? gl/2) < Al? B-1/2 ¢(B) B-1/? Al/2, 
From this, one obtains, using Lemma V.1.5, 
A+? F(A) A712 < B-\/? £(B)B71/?. 


Since all functions of an operator commute with each other, this shows that 
A~'f(A) < B-1f(B). Thus, g is operator monotone. 

(ii) =-(i): If f(t)/t is monotone on (0,@) we must have f(0) < 0. We 
will show that f satisfies the condition (iv) of Theorem V.2.3. Let P be 
any projection and let A be any positive operator with spectrum in (0, qa). 
Then there exists an ¢ > 0 such that (1+ €)A has its spectrum in (0,a). 
Since P+elI < (1+.e)I, we have Al/?(P+eI)A'/? < (1+6€)A. So, by the 
operator monotonicity of g, we have 


AW'?(P + 61) A~M? f(AV2(P 4 eI)AV?) < (146)! A7 fF ((1 4 e) A). 


Multiply both sides on the right by A!/?(P +I) and on the left by its 
conjugate (P + eI)A/?. This gives 


A~*? #(AM?(P+eI) A?) Al? (Ptel) < (1te)“1(P+el) f((1+e)A)(P+el1). 
Let ¢ — 0. This gives 
AW*/? #(Al/? PAl/2) Al/2 DP < Pf(A)P. 


Use the identity in Exercise V.2.8 to reduce this to Pf(PAP) < Pf (A)P, 


and then use the inequality in Exercise V.2.7 to conclude that f (PAP) < 
Pf(A)P, as desired. a 


As corollaries to the above results, we deduce the following statements 
about the power functions . 
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Theorem V.2.10 On the positive half-line (0,00) the functions f(t) = t’, 
where r is a real number, are operator monotone if and only if 0< r <1. 


Proof. If 0 <r < 1, we know that f(t) = t” is operator monotone by 
Theorem V.1.9. If r is not in [0,1], then the function f(t) = t” is not 


concave on (0,00). Therefore, it cannot be operator monotone by Theorem 
V.2.5. a 


Exercise V.2.11 Consider the functions f(t) = t" on (0,00). Use Theo- 
rems V.2.9 and V.2.10 to show that if r > 0, then f (t) is operator convex 
if and only if 1 <r <2. Use Corollary V.2.6 to show that f(t), is operator 


conver if -1 <r <0. (We will see later that f(t) is not operator convex 
for any other value of r.) 


Exercise V.2.12 A function f from (0,00) into itself is both operator 


monotone and operator conver if and only if it is of the form f(t) = 
a+ Bt, a,B>0. 


Exercise V.2.13 Show that the function f(t) = —t log t is operator con- 
cave on (0,00). 
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Let I be the open interval (—1,1). Let f be a continuously differentiable 
function on J. Then we denote by f!"! the function on I x I defined as 


FN pn) = WN oe ifA Aw 


FY) = f’Q). 


The expression f!J(\, 4) is called the first divided difference of f at 
A, [L). 
| fF \ is a diagonal matrix with diagonal entries \;,..., An, all of which are 
in I, we denote by f!#4J(A) the n x n symmetric matrix whose (i, j)-entry is 
flO, A;). If A is Hermitian and A = UAU*, let f(A) = Uf (A)U*. 
Now consider the induced map f on the set of Hermitian matrices with 
eigenvalues in J. Such matrices form an open set in the real vector space 
of all Hermitian matrices. The map f is called (Fréchet) differentiable at 
A if there exists a linear transformation Df(A) on the space of Hermitian 
matrices such that for all H 


| f(A + H) — f(A) — DF(A)(A) | = ofA). (V.8) 


The linear operator Df(A) is then called the derivative of f at A. Basic 
rules of the Fréchet differential calculus are summarised in Chapter 10. If 
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f is differentiable at A, then 


d 
Df(A)(A) = i f(A+tH). (V.9) 
t=0 
There is an interesting relationship between the derivative Df(A) and 
the matrix f!/(A). This is explored in the next few paragraphs. 


Lemma V.3.1 Let f be a polynomial function. Then for every diagonal 
matriz A and for every Hermitian matriz A, 


Df(A)(H) = f(A) oH, (V.10) 
where o stands for the Schur-product of two matrices. 


Proof. Both sides of (V.10) are linear in f. Therefore, it suffices to prove 
this for the powers f(t) = t?,p = 1,2,3,... For such f, using (V.9) one 


gets 
Pp 


Df(A)(H) = So Ae"* HAP-*. 
k=1 


p 
This is a matrix whose (i, 7)-entry is S- Apt AE * hay. On the other hand, 
k=1 


p | 

_ 1 ; k—1\p—k 

the (i, j)-entry of f!4I(A) is ) Ne ONG”. 
k=1 


a 
Corollary V.3.2 If A= UAU* and f is a polynomial function, then 
Df(A)(H) = U[f"!(A) o (U*HU)U*. (V.11) 
Proof. Note that 
d . d 
— f(UAU* +tH) =U | — f(A +tU* HU) U*, 
dt |, dt |, 
and use (V.10). | 


Theorem V.3.3 Let f € C(I) and let A be a Hermitian matriz with all 
its eigenvalues in I. Then 


Df(A)(H) = f(A) od, (V.12) 
where o denotes the Schur-product in a basis in which A is diagonal. 
Proof. Let A= UAU*, where A is diagonal. We want to prove that 


Df(A)(H) = U[f"l(A) 0 (U* HU)|U*. (V.13) 
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This has been proved for all polynomials f. We will extend its validity to 
all f € C' by a continuity argument. 

Denote the right-hand side of (V.13) by Df(A)(H). For each f in C!, 
Df (A) is a linear map on Hermitian matrices. We have 


ID F(A)(A) lo = |f%(A) 0 (U"HU)|Io. 


All entries of the matrix f!J(A) are bounded by sex / f'(t)|. (Use the 
tls 
mean value theorem.) Hence 


IDf(A)(A)lle < sax LPC) || He. (V.14) 


Let H be a Hermitian matrix with norm so small that the eigenvalues of 
A+ fi are in I. Let [a,b] be a closed interval in J containing the eigenvalues 
of both A and A+. Choose a sequence of polynomials fp, such that f, — f 
and f,, — f’ uniformly on [a,b]. Let £ be the line segment joining A and 
A+ Hi in the space of Hermitian matrices. Then, by the mean value theorem 
(for Fréchet derivatives), we have 


Ifm(A + H) — fn(A +H) — (fm(A) — fn(A))| 
|| || sup |D fm(X) — Dfn(X)]| 


||| sup ||Dfm(X) — Dfn(X)|l- (V.15) 
XEL 


IA 


This is so because we have already shown that Df, = Df, for the polyno- 
mial functions f,,. 

Let € be any positive real number. The inequality (V.14) ensures that 
there exists a positive integer no such that for m,n > no we have 


sup |Dfim(X) ~ Dfa(X)Il < = (V.16) 
XEL 
and : 
|Din(A) ~ Df(A)|| < &. (v.17) 
Let m — oo and use (V.15) and (V.16) to conclude that 
(A+) — f(A) -(fn(A+H)— fo ADS SII. (V.18) 


If || H|| is sufficiently small, then by the definition of the Fréchet derivative, 
we have 


| fn(A + H) — f(A) —Dfn(AEII S SII. (V-19) 
Now we can write, using the triangle inequality, 
f(A + H) — f(A) — DF(A)(A)| 
< ||f(A+ HH) — f(A) — (fn(A+ 2) — fal ADDI 
+ ||fn(A + H) — fr(A) - Dfn(A)(4)I 
+ |\(DF(A) — Dfr(A))D I, 
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and then use (V.17), (V.18), and (V.19) to conclude that, for ||H|| suffi- 
ciently small, we have 


f(A + H) — f(A) — DF(A)(A)II < el]. 
But this says that Df(A) = Df (A). = 


Let t — A(t) be a C’ map from the interval [0,1] into the space of 
Hermitian matrices that have all their eigenvalues in I. Let f € C'(J), and 
let F(t) = f(A(t)). Then, by the chain rule, Df(t) = DF(A(t))(A’(t)). 
Therefore, by the theorem above, we have 


F(1) — F(0) = / fi A(t)) o A’ (Edt, (V.20) 


where for each t the Schur-product is taken in a basis that diagonalises 
A(t). 


Theorem V.3.4 Let f € C'(I). Then f 7s operator monotone on I if 
and only if, for every Hermitian matriz A whose eigenvalues are in I, the 
matrix f'4](A) is positive. 


Proof. Let f be operator monotone, and let A be a Hermitian matrix 
whose eigenvalues are in I. Let H be the matrix all whose entries are 1. 
Then # is positive. So, A+tH > A if t > 0. Hence, f(A +tH) — f(A) 
is positive for small positive t. This implies that Df(A)(H) > 0. So, by 
Theorem V.3.3, fl!! (A)o H > 0. But, for this special choice of H, this just 
says that f!](A) > 0. 

To prove the converse, let A, B be Hermitian matrices whose eigenvalues 
are in /, and let B > A. Let A(t) = (1—t)A+tB, 0<t <1. Then A(t) 
also has all its eigenvalues in I. So, by the hypothesis, f (A(t)) > 0 for 
all t. Note that A’(t) = B—A > 0, for all t. Since the Schur-product of 
two positive matrices is positive, f!!](A(t)) o A’ (t) is positive for all t. So, 
by (V.20), f(B) — f(A) > 0. . 


Lemma V.3.5 If f is continuous and operator monotone on (—1,1), then 
for each -1<X<1 the function g(t) = (t + A) f(t) is operator conver. 


Proof. We will prove this using Theorem V.2.9. First assume that f is 
continuous and operator monotone on [—1, 1]. Then the function f(t — 1) 
is operator monotone on (0, 2). Let g(t) =tf(t—1). Then g(0) = 0 and the 
function g(t)/t is operator monotone on (0,2). Hence, by Theorem V.2.9, 
g(t) is operator convex on [0,2). This implies that the function hi(t) = 
g(t + 1) = (t+ 1)f(t) is operator convex on [—1,1). Instead of f(t), if the 
same argument is applied to the function —f(—t), which is also operator 
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monotone on |—1,1], we see that the function ha(t) = —(t + 1)f(—+) is 
operator convex on [—1,1). Changing t to —t preserves convexity. So the 
function h3(t) = he(—t) = (t — 1)f(£) is also operator convex. But for 
IA, < 1, ga(t) = Ahi (t) + 1—4h3(t) is a convex combination of h, and 
hg. So gy is also operator convex. 

Now, given f continuous and operator monotone on (—1, 1), the function 
f((1 — €)t) is continuous and operator monotone on [—1, 1] for each e > 0. 
Hence, by the special case considered above, the function (t+ A) f((1—)t) 


is operator convex. Let « — 0, and conclude that the function (¢ + A) f(t) 
is operator convex. a 


The next theorem says that every operator monotone function on I is 
in the class C!. Later on, we will see that it is actually in the class C'™. 
(This is so even if we do not assume that it is continuous to begin with.) 
In the proof we make use of some differentiability properties of convex 
functions and smoothing techniques. For the reader’s convenience, these 
are summarised in Appendices 1 and 2 at the end of the chapter. 


Theorem V.3.6 Every operator monotone function f on I is continuously 
differentiable. 


Proof. Let 0<e< 1, and let f, be a regularisation of f of order e. (See 
Appendix 2.) Then f, is a C™ function on (—1+e,1—e). It is also operator 
monotone. Let f(t) = lim f-(t). Then f(t) = $[f(t+) + f(t-)]. 

Let g-(t) = (t+1)f.(t). Then, by Lemma V.3.5, g- is operator convex. Let 
g(t) = lim g-(t). Then g(t) is operator convex. But every convex function 
(on an open interval) is continuous. So g(t) is continuous. Since g(t) = 
(t+1)f(t) and t+1> 0 on J, this means that f(t) is continuous. Hence 
f(t) = f(t). We thus have shown that f is continuous. 

Let g(t) = (+ 1)f(t). Then g is a convex function on I. So g is left and 
right differentiable and the one-sided derivatives satisfy the properties 


g_(t)< g(t), lim g4(s) = gi, (t), lim g'(s) = g’_(t). (V.21) 


But gi. (t) = f(t) + (t+ 1)fi(t). Since t+ 1 > 0, the derivatives f(t) also 
satisfy relations like (V.21). 

Now let A = (5 é), 8,t € (1,1). If e is sufficiently small, s,t are in 
(—1+¢e,1-—e). Since f, is operator monotone on this interval, by Theorem 


V.3.4, the matrix Fol(A) is positive. This implies that 


(EO #0) < f(s) fe(t). 


Let « — 0. Since f. — f uniformly on compact sets, f-(s) — f-(¢) converges 
to f(s) — f(t). Also, f{(t) converges to $[f/.(t) + f/(t)]. Therefore, the 
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above inequality gives, in the limit, the inequality 


(aan f(t) 


s—t 


) < FUFh(s) + SOS) + LO). 


Now let s | t, and use the fact that the derivatives of f satisfy relations 
like (V.21). This gives 


OP < FAO+ AOMAM + FO), 


which implies that f/.(t) = f_(¢). Hence f is differentiable. The relations 
(V.21), which are satisfied by f too, show that f’ is continuous. = 


a 


Just as monotonicity of functions can be studied via first divided differ- 
ences, convexity requires second divided differences. These are defined 
as follows. Let f be twice continuously differentiable on the interval J. Then 
fl2] is a function defined on I x I x I as follows. If 41, A2, A3 are distinct 


fAQOg, A2) — FH Og, As) 


fIOAL, Ae, A 
(Ai, A2, 3) = ad 


For other values of A1, 2, A3, f!7! is defined by continuity; €.£., 
1 
fRQ,A,A) = 5 f(A): 


Exercise V.3.7 Show that if A1, A2, A3 are distinct, then fRlOg, A2,A3) 18 
the quotient of the two determinants 


f(Ar)  fA2) f(s) At Ag AB 
Ay AQ AZ and Aj v2 A3 
1 1 i 1 1 1 


Hence the function f'?! is symmetric in its three arguments. 
Exercise V.3.8 If f(t) =t™, m= 2,3,..., show that 
FPA AAs) = SS ARAZAS. 
O<p,q,r 


p+qtr=m-2 


Exercise V.3.9 (i) Let f(t) = t™,m > 2. Let A be ann x n diagonal 
matriz; A = Sov, where P; are the projections onto the coordinate 


i=l 
axes. Show that for every H 
d? 


ie f(A+tdH) 


t=0 


2 S> APHA‘HAT 
p+qtr=m—2 


2 SS So NAMALP;HP;H Px, 


ptqtr=m—2 1<i,j,k<n 
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and 


f(A+tH) = 250 fPl(A:, Ay, A) PHP; H Pe. (V.22) 
1,),k 


dt? |, 
(11) Use a continuity argument, like the one used in the proof of Theorem 
V.3.3, to show that this last formula is valid for all C? functions f. 


Theorem V.3.10 If f € C?(I) and f is operator convex, then for each 
pe € I the function g(A) = fu, A) is operator monotone. 


Proof. Since f is in the class C?,g is in the class C!. So, by Theorem 
V.3.4, it suffices to prove that, for each n, the n x n matrix with entries 
gitl(\;,;) is positive for all Ay,..., A, in I. 

Fix n and choose any Aj,...,An+41 in I. Let A be the diagonal matrix 
with entries \1,...An41. Since f is operator convex and is twice differen- 
tiable, for every Hermitian matrix H, the matrix a f(A+tH) must 

t=0 
be positive. If we write P,,..., P41 for the projections onto the coordinate 


axes, we have an explicit expression for this second derivative in (V.22). 
Choose H to be of the form 


0 OO .-- € 
HK 0 OO .-- Eo | 
€: €& -- En 0 
where €),...,& are any complex numbers. Let x be the (n + 1)-vector 
(1,1,...,1,0). Then 
(c, PPHP;HPpr) = €4€:65 41 (V.23) 


for 1 <2,j,k <n+1, where 6; n+41 1s equal to 1 if 7 = n+1, and is equal to 
0 otherwise. So, using the positivity of the matrix (V.22) and then (V.23), 
we have 


0 < So FPA Ag, Ak) (e, Ph Pj HPy2) 
1<i,7,k<n+1 
= S- FRAG, Anti ApEn: 
1<i,k<n 


But, 


hi dg 
= gtr, Ax) 


FRAG, Andis Ak) 
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(putting \,+1 = pu in the definition of g). So we have 


0< S> g OG, Ane &- 


1<i,k<n 


Since €; are arbitrary complex numbers, this is equivalent to saying that 
the n x n matrix [g!4I(\;, Ax)] is positive. Z 


Corollary V.3.11 If f € C7(I), f(0) =0, and f is operator convex, then 


the function g(t) = F(t) is operator monotone. 


Proof. By the theorem above, the function fl! (0,¢) is operator mono- 
tone. But this is just the function f(t)/t in this case. a 


Corollary V.3.12 If f is operator monotone on I and f(0) = 0, then the 
function g(t) = "44 f(t) is operator monotone for |A| < 1. 


Proof. First assume that f € C?(I). By Lemma V.3.5, the function 


g(t) = (tf + A)f(t) is operator convex. By Corollary V.3.11, therefore, 
g(t) is operator monotone. 


If f is not in the class C”, consider its regularisations f-. These are in C?. 


Apply the special case of the above paragraph to the functions f-(t) — f-(0), 
and then let « — 0. a 


Corollary V.3.13 If f is operator monotone on I and f(0) = 0, then f 
ws twice differentiable at 0. 


Proof. By Corollary V.3.12, the function g(t) = (1 + +) f(£) is operator 
monotone, and by Theorem V. 3.6, it is continuously differentiable. So the 


function h defined as h(t) = + f(t), h(0) = f’(0) is continuously differen- 
tiable. This implies that f is twice differentiable at 0. a 


Exercise V.3.14 Let f be a continuous operator monotone function on I. 
Then the function F(t) = Ivf (s)ds is operator convez. 


Exercise V.3.15 Let f € C(I). Then f is operator convex if and only if 
for all Hermitian matrices A,B with eigenvalues in I we have 


f(A) — f(B) > f"(B) 0 (A — B), 


where o denotes the Schur-product in a basis in which B is diagonal. 
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V.4 Loewner’s Theorems 


Consider all functions f on the interval J = (—1,1) that are operator 
monotone and satisfy the conditions 


f(0)=0, ~—f’(0) = 1. (V.24) 


Let K be the collection of all such functions. Clearly, K is a convex set. We 
will show that this set is compact in the topology of pointwise convergence 
and will find its extreme points. This will enable us to write an integral 
representation for functions in K. 


Lemma V.4.1 If f € K, then 


f(t) < _. for 0<t<1, 
a 
th > —— —1 
f(t) > [at for <t<0O, 
f'"(0)| <2. 


Proof. Let A= (6 5) By Theorem V.3.4, the matrix 


OQ f(O/t 
ryay=( 40) ) 


is positive. Hence, 


2 
ny < f(t). (V.25) 


Let gi(t) = (t+ 1)f(t). By Lemma V.3.5, both functions gs, are con- 
vex. Hence their derivatives are monotonically increasing functions. Since 
g(t) = f(t) + (¢41)f'(t) and g/.(0) = +1, this implies that 


fé+t-lf'®H>-1 for t>0 (V.26) 
and 

f@H+(e¢+Df (t)<1 for t<0. (V.27) 
From (V.25) and (V.26) we obtain 

f(it)+1> COs" for t>0. (V.28) 


Now suppose that for some 0 < t < 1 we have f(t) > <4. Then f(t)? > 
4 f(t). So, from (V.28), we get f(t)+1 > ft) But this gives the inequality 
f(t) < +4, which contradicts our assumption. This shows that f(t) < aan 
for 0 <¢t < 1. The second inequality of the lemma is obtained by the same 
argument using (V.27) instead of (V.26). 
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We have seen in the proof of Corollary V.3.13 that 


1 _ (1+¢7*) f(t) — f'(0) 
/ a / _— ; 
f'(0) + 5f (0) = lim ; 
Let ¢ | 0 and use the first inequality of the lemma to conclude that this 
limit is smaller than 2. Let t t 0, and use the second inequality to conclude 


that it is bigger than 0. Together, these two imply that |f”(0)| < 2. | 


Proposition V.4.2 The set K is compact in the topology of pointwise 
convergence. 


Proof. Let {f;} be any net in K. By the lemma above, the set {f;(t)} 
is bounded for each t. So, by Tychonoff’s Theorem, there exists a subnet 
{ f;} that converges pointwise to a bounded function f. The limit function 
f is operator monotone, and f(0) = 0. If we show that f’(0) = 1, we would 
have shown that f € K, and hence that K is compact. 

By Corollary V.3.12, each of the functions (1 + +)fi(t) is monotone 


1 
on (—1,1). Since for all i, lim (1 + =) F(t) = f;(0) = 1, we see that 


(1+ F(t) > 1ift > 0 and is < 1 if t < 0. Hence, if t > 0, we have 
t 


(1+ =)f(t) > 1; and if t < 0, we have the opposite inequality. Since f is 
continuously differentiable, this shows that f’(0) = 1. a 


Proposition V.4.3 All extreme points of the set K have the form 


_ t 
~~ 1l-at’ 


f(t) where a= = f"(0). 


Proof. Let f € K. For each \,—-1 < A < 1, let 


an(t) = (1+ S)f()—d 


By Corollary V.3.12, g, is operator monotone. Note that gx(0) = 0, since 
f(0) = 0 and f'(0) = 1. Also, g\(0) = 1+ $Af"(0). So the function hy 
defined as 
1 
h(t) = ——.——__|[(1 + —) f(t) - 
is in K. Since | f”(0)| < 2, we see that |SAf”(0)| < 1. We can write 


1 i 
f= 51+ S"O)ha + (1 SAF"(O)Ja. 


So, if f is an extreme point of K, we must have f = hy. This says that 


A 


(+ S)s() =), 


1+ SFO) 
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from which we can conclude that 


f(t) = —~ 


ey ZOS . 


Theorem V.4.4 For each f in K there exists a unique probability measure 
won |—1,1] such that 


1 


f(t) = / an). (V.29) 


—1 


Proof. For —1 < \ < 1, consider the functions h(t) = wom: By Propo- 
sition V.4.3, the extreme points of K are included in the family {hy}. 
Since K is compact and convex, it must be the closed convex hull of its 
extreme points. (This is the Krein-Milman Theorem.) Finite convex com- 
binations of elements of the family {hy : -1<A< 1} can also be writ- 
ten as { hydv(A), where v is a probability measure on [—1,1] with finite 
support. Since f is in the closure of these combinations, there exists a 
net {v;} of finitely supported probability measures on [—1,1] such that 
the net fi(t) = Jf ha(t)dv;(A) converges to f(t). Since the space of the 
probability measures is weak* compact, the net v; has an accumulation 
point y. In other words, a subnet of { hydv;(X) converges to f hydu(A). So 
f(t) = f hy(t)du(d) = f be du(. 

Now suppose that there are two measures yz; and pa for which the 
representation (V.29) is valid. Expand the integrand as a power series 


CO 
ig = Soirtt yn convergent uniformly in |\| < 1 for every fixed t with 


n=0 
jt] < 1. This shows that 


1 oo 1 
Scent} [rane = Sor [xa 
n=0 “y n=0 “4 


for all |t| < 1. The identity theorem for power series now shows that 


1 1 
[rane = | rdua(2), n=0,1,2,... 
—1 —1 


But this is possible if and only if uw, = po. a 


One consequence of the uniqueness of the measure p in the representation 
(V.29) is that every function h), is an extreme point of K (because it can 
be represented as an integral like this with 4 concentrated at Xo). 

The normalisations (V.24) were required to make the set K compact. 
They can now be removed. We have the following result. 


134 V. Operator Monotone and Operator Convex Functions 


Corollary V.4.5 Let f be a nonconstant operator monotone function on 
(—1,1). Then there exists a unique probability measure pw on [—1, 1] such 


that 1 


f(t) = f(0) + f’(0) [ro = Toy tt (V.30) 


—l 


Proof. Since f is monotone and is not a constant, f’(0) 4 0. Now note 
that the function Boni isin K. = 


It is clear from the representation (V.30) that every operator monotone 
function on (—1, 1) is infinitely differentiable. Hence, by the results of earlier 
sections, every operator convex function is also infinitely differentiable. 


Theorem V.4.6 Let f be a nonlinear operator convex function on (—1,1). 
Then there exists a unique probability measure on |—1,1] such that 


tt 


f(t) = F(0) 


). (V.31) 


Proof. Assume, without loss of generality, that f(0) = 0 and f’(0) = 0. 
Let g(t) = f(t)/t. Then g is operator monotone by Corollary V.3.11, g(0) = 
0, and g’(0) = $f”(0). So g has a representation like (V.30), from which 
the representation (V.31) for f follows. a 


We have noted that the integral representation (V.30) implies that every 
operator monotone function on (—1,1) is infinitely differentiable. In fact, 


we can conclude more. This representation shows that f has an analytic 
continuation 
1 


F(2) = (0) + £0) | aut (V.32) 


—1 


defined everywhere on the complex plane except on (—oo, —1]U[1, 00). Note 
that 


z  Imz 
L—Az  |1—Azl?’ 


So f defined above maps the upper half-plane H, = {z : Im z > 0} into 
itself. It also maps the lower half-plane H_ into itself. Further, f(z) = f(2). 
In other words, the function f on H_ is an analytic continuation of f on 
Hf, across the interval (—1,1) obtained by reflection. 

This is a very important observation, because there is a very rich theory 
of analytic functions in a half-plane that we can exploit now. Before doing 
so, let us now do away with the special interval (—1, 1). Note that a function 
f is operator monotone on an interval (a,b) if and only if the function 


Im 
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f (Goat + ota) is operator monotone on (—1,1). So, all results obtained 
for operator monotone functions on (—1,1) can be extended to functions 
on (a,b). We have proved the following. 


Theorem V.4.7 If f is an operator monotone function on (a,b), then f 
has an analytic continuation to the upper half-plane H, that maps H4. 
into itself. It also has an analytic continuation to the lower-half plane H_, 
obtained by reflection across (a, b). 


The converse of this is also true: if a real function f on (a,b) has an 
analytic continuation to H, mapping H, into itself, then f is operator 
monotone on (a,b). This is proved below. 

Let P be the class of all complex analytic functions defined on H 4 with 
their ranges in the closed upper half-plane {z : Im z > 0}. This is called 
the class of Pick functions. Since every nonconstant analytic function is 
an open map, if f is a nonconstant Pick function, then the range of f is 
contained in H,. It is obvious that P is a convex cone, and the composition 
of two nonconstant functions in P is again in P. 


Exercise V.4.8 (i) ForO<r< 1, the function f(z) = 2" is in P. 
(it) The function f(z) = log zis in P. 
(iit) The function f(z) = tan z is in P. 
(iv) The function f(z) = —+ is in P. 
(v) If f is in P, then so is the function >. 

Given any open interval (a,b), let P(a,b) be the class of Pick functions 
that admit an analytic continuation across (a,b) into the lower half-plane 
and the continuation is by reflection. In particular, such functions take only 
real values on (a,b), and if they are nonconstant, they assume real values 
only on (a,b). The set P(a,b) is a convex cone. 

Let f € P(a,b) and write f(z) = u(z) + iv(z), where as usual u(z) and 
u(z) denote the real and imaginary parts of f. Since v(x) = 0 fora < x < b, 
we have u(x+iy)—v(xz) > Oif y > 0. This implies that the partial derivative 
Vy(x) > 0 and hence, by the Cauchy-Riemann equations, uz(x) > 0. Thus, 
on the interval (a,b), f(z) = u(x) is monotone. In fact, we will soon see 
that f is operator monotone on (a,b). This is a consequence of a theorem 
of Nevanlinna that gives an integral representation of Pick functions. We 
will give a proof of this now using some elementary results from Fourier 
analysis. The idea is to use the conformal equivalence between H, and the 
unit disk D to transfer the problem to D, and then study the real part u 


of f. This is a harmonic function on D, so we can use standard facts from 
Fourier analysis. 


Theorem V.4.9 Let u be a nonnegative harmonic function on the unit 
disk D = {z:|z| <1}. Then there exists a finite measure m on [0,27] such 
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that on 


| 1—r? 

iO) _ dm (E). V.33 

u(re") / 1 +r? — 2r cos(@ — t) m(t) ( ) 
0 


Conversely, any function of this form is positive and harmonic on the unit 
disk D. 


Proof. Let u be any continuous real function defined on the closed unit 
disk that is harmonic in D. Then, by a well-known and elementary theorem 
in analysis, 


27 

. 1 1—r? 

10 — ee ut dt 

u(re") 27 / 1+ r?— 2r cos(6 — ule ) 
27 
_ i / P,(6 — t)u(e**)dt, (V.34) 
27 

0 


where P,(@) is the Poisson kernel (defined by the above equation) for 0 < 
r<1,0<8@< 2n. If u is nonnegative, put dm(t) = =u(e)dt. Then m 
is a positive measure on [0,27]. By the mean value property of harmonic 
functions, the total mass of this measure is 


1 27 | —_ 
~ / u(e*)dt = u(0). (V.35) 


So we do have a representation of the form (V.33) under the additional 
hypothesis that wu is continuous on the closed unit disk. 

The general case is a consequence of this. Let u be positive and harmonic 
in D. Then, for € > 0, the function u,(z) = u(74,) is positive and harmonic 
in the disk |z| < 1+. Therefore, it can be represented in the form (V.33) 
with a measure m,(t) of finite total mass u,(0) = u(0). Ase > 0, u- 
converges to u uniformly on compact subsets of D. Since the measures 
m,e all have the same mass, using the weak* compactness of the space of 
probability measures, we conclude that there exists a positive measure m 
such that 


27 
u(re’’) = lim u-(re®) = / ten 
e>0 © 1+r?— 2r cos(6 — t) 
Conversely, since the Poisson kernel P, is nonnegative any function repre- 
sented by (V.33) is nonnegative. | 


Theorem V.4.9 is often called the Herglotz Theorem. It says that every 


nonnegative harmonic function on the unit disk is the Poisson integral of 
a positive measure. 
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Recall that two harmonic functions u,v are called harmonic conju- 
gates if the function f(z) = u(z) + iv(z) is analytic. Every harmonic 


function u has a harmonic conjugate that is uniquely determined up to 
an additive constant. 


Theorem V.4.10 Let f(z) = u(z) + iv(z) be analytic on the unit disk D. 


If u(z) = 0, then there ezists a finite positive measure m on [0,27] such 
that ; 
T 


f(z) = / - — dm(t) + iv(0). (V.36) 


Conversely, every function of this form is analytic on D and has a positive 
real part. 


Proof. By Theorem V.4.9, the function u can be written as in (V.33). 
The Poisson kernel P,, 0 <r < 1, can be written as 


1—r? 1+ re? 
P..(0 ee [In| pinO _ ne 
(9) = 1+r?— 2r cos 0 =yy Re 1 — re’? 


Hence, 
1 i(0—t) it 10 
1 — ret(@—-t) eit _ re20 
and 
on it 4 
e z 
u(z) = Re / sit > m(t) 
0 


So, f(z) differs from this last integral only by an imaginary constant. 
Putting z = 0, one sees that this constant is iv(0). 
The converse statement is easy to prove. a 


Next, note that the disk D and the half-plane H, are conformally equiv- 
alent,, i.e., there exists an analytic isomorphism between these two spaces. 
For z € D, let 


1 z+1 
V.37 
(2) => =. (V.37) 
Then ¢ € H.. The inverse of this map is given by 
A¢) = $= (V.38) 


C+ 


Using these transformations, we can establish an equivalence between the 
class P and the class of analytic functions on D with positive real part. If 
f is a function in the latter class, let 


p(C) = tf(2(9)). (V.39) 
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Then y € P. The inverse of this transformation is 


f(z) = —ip(¢(2)). (V.40) 


Using these ideas we can prove the following theorem, called Nevan- 
linna’s Theorem. 


Theorem V.4.11 A function ¢ is in the Pick class if and only if it has a 


representation 
1+A 
egy=a+ec+ f fair) (V.Al) 


where a is a real number, B > 0, and v is a positive finite measure on the 
real line. 


Proof. Let f be the function on D associated with y via the transforma- 
tion (V.40). By Theorem V.4.10, there exists a finite positive measure m 
on [0,27] such that 

Qn 

= | 


If f(z) = u(z) +7v(z), then a = —v(0), and the total mass of m is u(0). If 
the measure m has a positive mass at the singleton {0}, let this mass be 
G3. Then the expression above reduces to 


= dnt) — ia. 


am 


ert 


flz)= 


(0,27) 


Using the transformations (V.38) and (V.39), we get from this 


eit 4 $= 

pC) =a+BCor+i / —_—2*" dm(t). 
ete — 
(0,27) Cte 


The last term above is equal to 


¢ cos $ —sin 2 dm(t) 
¢ sin $+ cos $ 
(0,277) 
Now, introduce a change of variables 4 = —cot 5 This maps (0, 27) onto 


(—oo,0o). The measure m is transformed by the above map to a finite 
measure V on (—oo, co) and the above integral is transformed to 


je 


: dv(X). 
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This shows that can be represented in the form (V.41). 
It is easy to see that every function of this form is a Pick function. a 


There is another form in which it is convenient to represent Pick func- 
tions. Note that 


LAC _ | 1 
h-C h-e wai 


So, if we write du(A) = (A? + 1)dv(A), then we obtain from (V.41) the 
representation 


\* +1). 


y(¢) =at BC+ | J ~ reat du(A), (V.42) 


—CO 


where pu is a positive Borel measure on R, for which [ xs, dy(A) is finite. 
(A Borel measure on R is a measure defined on Borel sets that puts finite 


mass on bounded sets.) 


Now we turn to the question of uniqueness of the above representations. 
It is easy to see from (V.41) that 


a = Re y(t). (V.43) 


Therefore, a is uniquely determined by vy. Now let 7 be any positive real 
number. From (V.41) we see that 


pin) _ a | L422 45d(n — 974) 
T= 4 64 Pap 


in in dv(X). 


As 7 — oo, the integrand converges to 0 for each \. The real and imaginary 
parts of the integrand are uniformly bounded by 1 when 7 > 1. So by the 
Lebesgue Dominated Convergence Theorem, the integral converges to 0 as 
7 — co. Thus, 

B= lim plin)/in, (V.44) 


and thus (@ is uniquely determined by y. 

Now we will prove that the measure du in (V.42), is uniquely determined 
by y. Denote by p the unique right continuous monotonically increasing 
function on R satisfying w(0) = 0 and p((a,b]) = u(b) — p(a) for every 
interval (a, b]. (This is called the distribution function associated with 
du.) We will prove the following result, called the Stieltjes inversion 
formula, from which it follows that p is unique. 


Theorem V.4.12 If the Pick function y is represented by (V.42), then 
for any a,b that are points of continuity of the distribution function pw we 
have 


)— 


b 
u(b) — ula) = lim - / Im y(z + 17) dz. (V.45) 
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Proof. From (V.42) we see that 


lI 
| 
a 
a 
3 
+ 
© 
2/3 
Le) 
+ 
3 
we) 
a 
= 
= 
= 


b 
i / Im v(x + in)dx 
1 


I 
| 
WD 
> 
oo 
| 
& 
+ 

o—~ 

& 

RaAbae 
wm) & 

+ 

LW) 
Qu 
& 
eum 
a 


the interchange of integrals being permissible by Fubini’s Theorem. As 
7 — 0, the first term in the square brackets above goes to 0. The inner 
integral can be calculated by the change of variables u = aA This gives 


, b=d 

/ ndx _ / du 
(x—r)2 +72 u* + 1 

a a—>» 


(FA) - arctan (>) 
= arctan {| —— ]} — arctan ——— }. 
1) 1) 


So to prove (V.45), we have to show that 


u(b) ~ w(a) = lim ~ | arctan (-=*) ~ arctan (“=*)| du()). 


We will use the following properties of the function arctan. This is a mono- 
tonically increasing odd function on (—oo,0o) whose range is (—4, 4). 


So, 
0 < arctan e~ — *) — arctan (“— _ *) <7. 
7) ” 


If (b — A) and (a — A) have the same sign, then by the addition law for 
arctan we have, 


(°°) (“—*) n(b — a) 
arctan | ——— ]| — arctan | ——— | = arctan ——-—__~___. 
n n n° + (b— )(a— Xd) 


If x is positive, then 


z 


dt 
arctan e = [ < fat=x. 
1+t? 
0 


0 


Now, let € be any given positive number. Since a and b are points of con- 
tinuity of 4, we can choose 6 such that 


w(a+6)—pla—6) < e/5, 
p(b+6)—p(b—6) < e/5. 
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We then have, 


IA 
Spe 
“8 
9 
ot 
o) 
ct 
ev) 
rR 
—~ 
o~ 
3] 
SY 
NY 
| 
© 
hy 
o) 
oct 
jet) 
5 
——— ~ 


— * Via) 


b+6 


b—6 
- | (CS) (>) 
+ — w — arctan | ——— ] + arctan | —— Jldu(\ 
- J 7 7) aH) 


6 


IA 
| 
+ 
| 


+2 / arctan (aa ae) dp(A). 


Note that in the two integrals with infinite limits, the arguments of arctan 
are positive. In the middle integral the variable A runs between a+ 6 and 
b — 6. For such A, b-A > S and a < ~S. So the right-hand side of the 
above inequality is dominated by 


2€ n r b—a 
“ tof TE ayy 
5 UT / n+ (b—A)(a— A) BA) 
b+6 
a—sé 


b—a 
+2 | area 


— oo 


> 


Y13 


b—6 


) 
/ [7 — 2 arctan 7 aH). 
a+é6 


+ 


The first two integrals are finite (because of the properties of du). The third 
one is dominated by 2(5 — arctan 2 )[u(b) — p(a)|. So we can choose 7 small 
enough to make each of the last three terms smaller than ¢/5. This proves 
the theorem. | 
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We have shown above that all the terms occurring in the representation 
(V.42) are uniquely determined by the relations (V.43), (V.44), and (V.45). 


Exercise V.4.13 We have proved the relations (V.33), (V.86), (V.41) and 
(V.42) in that order. Show that all these are, in fact, equivalent. Hence, 
each of these representations 28 unique. 


Proposition V.4.14 A Pick function ~— is in the class P(a, b) if and only 
if the measure pt associated with it in the representation (V.42) has zero 
mass on (a,b). 


Proof. Let y(x +%2n) = u(x + in) + iv(x@ + in), where u,v are the real 
and imaginary parts of y. If y can be continued across (a, b), then as 7 | 0, 
on any closed subinterval |c, d] of (a,b), v(z + in) converges uniformly to a 
bounded continuous function v(x) on [c,d]. Hence, 


i.e., du(x) = +v(zx)dz. If the analytic continuation to the lower half-plane 
is by reflection across (a, b), then v is identically zero on [c, d] and hence so 
iS pL. 

Conversely, if 4 has no mass on (a,b), then for ¢ in (a,b) the integral 
in (V.42) is convergent, and is real valued. This shows that the function y 
can be continued from H4 to H_ across (a,b) by reflection. = 


The reader should note that the above proposition shows that the con- 
verse of Theorem V.4.7 is also true. 

It should be pointed out that the formula (V.42) defines two analytic 
functions, one on H, and the other on H_. If these are denoted by y and 
w, then y(C) = W(C). So w and w are reflections of each other. But they 
need not be analytic continuations of each other. For this to be the case, 
the measure yz should be zero on an interval (a,b) across which the function 
can be continued analytically. 


Exercise V.4.15 If a function f is operator monotone on the whole real 
line, then f must be of the form f(t)=a+Bt,aeER, B>O. 


Let us now look at a few simple examples. 


Example V.4.16 The function y(¢) = —z 1s a Pick function. For this 
function, we see from (V.43) and (V.44) that a = 8B = 0. Since » is 
analytic everywhere in the plane except at 0, Proposition V.4.14 tells us 
that the measure ys 1s concentrated at the single point 0. 
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Example V.4.17 Let y(¢) = ¢'/2 be the principal branch of the square 
root function. This is a Pick function. From (V.43) we see that 


; 1 
a = Re y(i) = Re e’™/4 = —. 
(i) e Ta 


From (V.44) we see that B = 0. If € = \+in is any complez number, then 


y\ 1/2 _ \\ 1/2 


where sgn 7 is the sign of n, defined to be 1 if 7 > 0 and —1 fn < 0. 


, _ (ig-ay\? 
So ifn > 0, we have Im y(¢) = (=) . Asn | 0, |¢| comes closer to 


|A|. So, Im y(A + in) converges to 0 if X>0 and to |A\!/2 if \ < 0. Since 
y ts positive on the right half-azis, the measure 1 has no mass at 0. The 
measure can now be determined from (V.45). We have, then 


0 
v2 1 tA |A|?/2 
¢ -at+/(% wo) —d), (V.46) 


— OO 


Example V.4.18 Let y(¢) = Log ¢, where Log is the principal branch 
of the logarithm, defined everywhere except on (—0oo,0] by the formula 
Log ¢ = In|¢| +7 Arg ¢. The function Arg ¢ is the principal branch of 
the argument, taking values in (—1, 7]. We then have 


a = Re(Logi) =0 
Loe(; 

B = lim Losin) _ 
noo | aN 


As n | 0,Im (Log(A + in)) converges to x if X < 0 and to OifX > 0. 
So from (V.45) we see that, the measure wu is just the restriction of the 
Lebesgue measure to (—oo,0]. Thus, 


0 
1 mA 


— CO 


Exercise V.4.19 For0 <r <1, let C” denote the principal branch of the 
function p(C) = ¢". Show that 


0 


CT = cos + sin rn / ( : A ) |A|" da. (V.48) 


9 1 N-C 41 


— CO 


This includes (V.46) as a special case. 
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Let now f be any operator monotone function on (0,00). We have seen 
above that f must have the form 


f(t) =a+Bi+ | (oy ~ ei) du(A). 


—&O 


By a change of variables we can write this as 


Tl 1 
= —— — —— | du(r V.4 
peysatars [(>>- 52) aul), (vas) 
0 
where a ER, @ > 0 and p is a positive measure on (0,00) such that 
| tay) < (V.50) 
yal OO. , 
0 
Suppose f is such that 
f(O) := lim f(t) > —oo. (V.51) 


t—0 


Then, it follows from (V.49) that w must also satisfy the condition 
1 
/ x4u(A) < 00. (V.52) 
0 
We have from (V.49) 


f(t) — f(0) 


I| 
RD 
oo. 
+- 

o—— 2 
——~ 
~l eR 
| 
~ 
+ |e 
om 
NL 
& 
aS 


oe t 
bt + Q+Or du(A). 


Hence, we can write f in the form 
fit)=74+ 6tt+ / ——dw(A), (V.53) 
0 


where y = f(0) and dw(A) = ysdu(X). From (V.50) and (V.52), we see 
that the measure w satisfies the conditions 


1 

2 
/ ye) < oo and / Adw(A) < oo. (V.54) 
0 0 
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These two conditions can, equivalently, be expressed as a single condition 


CoO 


/ ; =, d(d) < 00. (V.55) 
0 


We have thus shown that an operator monotone function on (0,00) sat- 
isfying the condition (V.51) has a canonical representation (V.53), where 
7 € R,6 = 0 and w is a positive measure satisfying (V.55). 

The representation (V.53) is often useful for studying operator monotone 
functions on the positive half-line [0, 00). 


Suppose that we are given a function f as in (V.53). If w satisfies the 
conditions (V.54) then 


J (ea }) em 00>» 


and we can write 


f(t)= | (<5 17 5) away pnaee | (sry - 173 < <a) d* dw(A). 


So, if we put the number in braces above equal to a and du(A) = A*dw(A), 
then we have a representation of f in the form (V.49). 


Exercise V.4.20 Use the considerations in the preceding paragraphs to 
show that, forO<r<1andt>0, we have 


OO 


sin r7 At 5 
r rd). V.56 
t T libes A+t A ( ) 


(See Exercise V.1.10 also.) 


Exercise V.4.21 Fort > 0, show that 


log(1 +t) = [<e N77 dX. (V.57) 


Appendix 1. Differentiability of Convex Functions 


Let f be a real valued convex function defined on an interval J. Then f 
has some smoothness properties, which are listed below. 

The function f is Lipschitz on any closed interval [a,b] contained in I°, 
the interior of I. So f is continuous on I°. 
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At every point x in J°, the right and left derivatives of f exist. These 
are defined, respectively, as 


roy yee FY) = F(z) 
fe) ye 
I (p) = rm LY) 2 F@) 

f_(2) = lim 


Both these functions are monotonically increasing on J°. Further, 


lim fe(x) = fi(w), 


fi (w). 


Lim f(x) 


The function f is differentiable except on a countable set F in I°, i.e., at 
every point x in [°\F the left and right derivatives of f are equal. Further, 
the derivative f’ is continuous on [°\E. 

If a sequence of convex functions converges at every point of J, then the 


limit function is convex. The convergence is uniform on any closed interval 
[a, b] contained in J°. 


Appendix 2. Regularisation of Functions 


The convolution of two functions leads to a new function that inherits 
the stronger of the smoothness properties of the two original functions. 
This is the idea behind “regularisation” of functions. 

Let y be a real function of class C® with the following properties: yp > 
0,y is even, the support supp y = [—1,1], and f yp = 1. For each ¢€ > 
0, let ye(z) = 2y(£). Then supp y, = [—e,e] and y, has all the other 
properties of y listed above. The functions y, are called mollifiers or 
smooth approximate identities. 


If f is a locally integrable function, we define its regularisation of 
order ¢€ as the function 


felt) =(f* vee) := / f(x — y)pe(y)dy 
= / f(a — et)o(t)at. 


The family f. has the following properties. 
1. Each f. isa C™ function. 


2. If the support of f is contained in a compact set K, then the support 
of f- is contained in an e-neighbourhood of K. 
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3. If f is continuous at zo, then lim fe(Zo) = f(z). 


4. If f has a discontinuity of the first kind at zo, then lim fe(zo) = 
E10 


1/2 |f(ao+) + f(xo—)]. (A point x9 is a point of discontinuity of the 
first kind if the left and right limits of f at xo exist; these limits are 
denoted as f(xo—) and f(xo+), respectively.) 


5. If f is continuous, then f(x) converges to f(x) as ¢ > 0. The con- 
vergence is uniform on every compact set. 


6. If f is differentiable, then, for every e > 0,(f-)’ = (f’)e. 


¢. If f is monotone, then, as « — 0, f(x) converges to f’(x) at all 
points x where f’(x) exists. (Recall that a monotone function can 
have discontinuities of the first kind only and is differentiable almost 
everywhere. ) 


V.5 Problems 


Problem V.5.1. Show that the function f(t) = exp t is neither operator 
monotone nor operator convex on any interval. 


Problem V.5.2. Let f(t) = et where a,b,c,d are real numbers such 


that ad — be > 0. Show that f is operator monotone on every interval that 
does not contain the point =? 


Problem V.5.3. Show that the derivative of an operator convex function 
need not be operator monotone. 


Problem V.5.4. Show that for r < —1, the function f(t) = t” on (0,00) 
is not operator convex. (Hint: The function f!!!(1,t) cannot be continued 
analytically to a Pick function.) Together with the assertion in Exercise 
V.2.11, this shows that on the half-line (0,00) the function f(t) = ¢t” is 
operator convex if -—1 <r < Qorif1 <r < 2; and it is not operator 
convex for any other real r. 


Problem V.5.5. A function g on (0,00) is operator convex if and only if 
it is of the form 


tT de 
= 2 du(d), 
g(t) =a + Bt+ yt +/ yg CHO) 
0 


where a, 8 are real numbers, y > 0, and yp is a positive finite measure. 
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Problem V.5.6. Let f be an operator monotone function on (0,00). Then 
(—1)"-1 f(t) > 0 for n = 1,2,.... [A function g on (0,00) is said to 
be completely monotone if for all n > 0, (—1)"g™(t) > 0. There 
is a theorem of S.N. Bernstein that says that a function g is completely 
monotone if and only if there exists a positive measure yz such that g(t) = 


Co 

f e~du(A).] The result of this problem says that the derivative of an 

0 

operator monotone function on (0,00) is completely monotone. Thus, f 
CO 

has a Taylor expansion f(t) = OAC — 1)”, in which the coefficients a, 


n=0 
are positive for all odd n and negative for all even n. 


Problem V.5.7. Let f be a function mapping (0, oo) into itself. Let g(t) = 
[f(t—*)]-'. Show that if f is operator monotone, then g is also operator 
monotone. If f is operator convex and f(0) = 0, then g is operator convex. 


Problem V.5.8. Show that the function f(¢) = —cot ¢ is a Pick function. 
Show that in its canonical representation (V.42), a = @ = 0 and the 
measure ps is atomic with mass 1 at the points na for every integer n. 
Thus, we have the familiar series expansion 


1©.@) 


1 nT 
—cot (= a 
cot d n?n? + 1 


nN=—CO 


Problem V.5.9. The aim of this problem is to show that if a Pick function 
y satisfies the growth restriction 


sup |7~(i7)| < oo, (V.58) 
1-00 


then its representation (V.42) takes the simple form 


(= | ~ 1 any), (V.59) 


where y is a finite measure. 
To see this, start with the representation (V.41). The condition (V.58) 
implies the existence of a constant M that bounds, for all 7 > 0, the 


quantity ny(tn), and hence also its real and imaginary parts. This gives 
two inequalities: 


f n(l—n?)d 
jan+ | eg ge VAIS, 


— oo 


f 1+» 
ian? +0 | yp ev)! < M. 


— OO 


V.6 Notes and References 
From the first, conclude that 


a= lim, / re ae (A) = [ raven, 


— Co 


From the second, conclude that @ = 0 and 


2 
n 
/ Yap (1+ A?)dv(A) < M. 
Taking limits as 7 — oo, this gives 


Jo + \*)dv(A) = [- du(XA) < M. 


—— CO 


Thus, p is a finite measure. From (V.41), we get 


y(C) = [rw | EFAS a(D). 


A—¢ 


—Cco 


This is the same as (V.59). 
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Conversely, observe that if y has a representation like (V.59), then it 


must satisfy the condition (V.58). 


Problem V.5.10. Let f be a function on (0,00) such that 


where a € R, 6 > 0 and pz is a positive measure such that [ + di(A) < 00. 
Then f is operator monotone. Find operator monotone functions that can 


not be expressed in this form. 


V.6 Notes and References 


Operator monotone functions were first studied in detail by K.LOwner 
(C. Loewner) in a seminal paper Uber monotone Matrizfunktionen, Math. 
Z., 38 (1934) 177-216. In this paper, he established the connection between 
operator monotonicity, the positivity of the matrix of divided differences 
(Theorem V.3.4), and Pick functions. He also noted that the functions 
f(t) =t7, 0<r <1, and f(t) = log t are operator monotone on (0, 00). 
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Operator convex functions were studied, soon afterwards, by F’. Kraus, U ber 
konvexe Matrizfunktionen, Math. Z., 41(1936) 18-42. 

In another well-known paper, Beitrdge zur Storungstheorie der Spectralz- 
erlegung, Math. Ann., 123 (1951) 415-438, E. Heinz used the theory of 
operator monotone functions to study several problems of perturbation 
theory for bounded and unbounded operators. The integral representation 
(V.41) in this context seems to have been first used by him. The operator 
monotonicity of the map A — A” for 0 < r < 1 is sometimes called the 
“Loewner-Heinz inequality”, although it was discovered by Loewner. 

J. Bendat and S. Sherman, Monotone and convex operator functions, 
Trans. Amer. Math. Soc., 79(1955) 58-71, provided a new perspective on 
the theorems of Loewner and Kraus. Theorem V.4.4 was first proved by 
them, and used to give a proof of Loewner’s theorems. 

A completely different and extremely elegant proof of Loewner’s Theo- 
rem, based on the spectral theorem for (unbounded) selfadjoint operators 
was given by A. Koranyi, On a theorem of Léwner and its connections with 
resolvents of selfadjoint transformations, Acta Sci. Math. Szeged, 17 (1956) 
63-70. 

Formulas like (V.13) and (V.22) were proved by Ju. L. Daleckii and S.G. 
Krein, Formulas of differentiation according to a parameter of functions 
of Hermitian operators, Dokl. Akad. Nauk SSSR, 76 (1951) 13-16. It was 
pointed out by M.G. Krein that the resulting Taylor formula could be used 
to derive conditions for operator monotonicity. 

A concise presentation of the main ideas of operator monotonicity and 
convexity, including the approach of Daleckii and Krein, was given by 
C. Davis, Notions generalizing convezity for functions defined on spaces 
of matrices, in Convexity: Proceedings of Symposia in Pure Mathematics, 
American Mathematical Society, 1963, pp. 187-201. This paper also dis- 
cussed other notions of convexity, examples and counterexamples, and was 
very influential. 

A full book devoted to this topic is Monotone Matrix Functions and 
Analytic Continuation, by W.F. Donoghue, Springer-Verlag, 1974. Several 
ramifications of the theory and its connections with classical real and com- 
plex analysis are discussed here. 

In a set of mimeographed lecture notes, Topics on Operator Inequalities, 
Hokkaido University, Sapporo, 1978, T. Ando provided a very concise mod- 
ern survey of operator monotone and operator convex functions. Anyone 
who wishes to learn the Koranyi method mentioned above should certainly 
read these notes. 

A short proof of Lowner’s Theorem appeared in G. Sparr, A new proof of 
Lowner’s theorem on monotone matrix functions, Math. Scand., 47 (1980) 
266-274. 

In another brief and attractive paper, Jensen’s inequality for operators 
and Lowner’s theorem, Math. Ann., 258. (1982) 229-241, F. Hansen and 
G.K. Pedersen provided another approach. 
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Much of Sections 2, 3, and 4 are based on this paper of Hansen and 
Pedersen. For the latter parts of Section 4 we have followed Donoghue. We 
have also borrowed freely from Ando and from Davis. Our proof of The- 
orem V.1.9 is taken from M. Fujii and T. Furuta, Lowner-Heinz, Cordes 
and Heinz-Kato inequalities, Math. Japonica, 38 (1993) 73-78. Characteri- 
sations of operator convexity like the one in Exercise V.3.15 may be found 
in J.S. Aujla and H.L. Vasudeva, Conver and monotone operator functions, 
Ann. Polonici Math., 62 (1995) 1-11. 

Operator monotone and operator convex functions are studied in R.A. 
Horn and C.R. Johnson, Topics in Matrix Analysis, Chapter 6. See also the 
interesting paper R.A. Horn, The Hadamard product, in C.R. Johnson, ed. 
Matriz Theory and Applications, American Mathematical Society, 1990. 

A short, but interesting, section of the Marshall-Olkin book (cited in 
Chapter 2) is devoted to this topic. Especially interesting are some of the 
examples and connections with statistics that they give. 

Among several applications of these ideas, there are two that we should 
mention here. Operator monotone functions arise often in the study of 
electrical networks. See, e.g., W.N. Anderson and G.E. Trapp, A class of 
monotone operator functions related to electrical network theory, Linear 
Algebra Appl., 15(1975) 53-67. They also occur in problems related to 
elementary particles. See, e.g., E. Wigner and J. von Neumann, Significance 
of Lowner’s theorem in the quantum theory of collisions, Ann. of Math., 59 
(1954) 418-433. 

There are important notions of means of operators that are useful in 
the analysis of electrical networks and in quantum physics. An axiomatic 
approach to the study of these means was introduced by F. Kubo and 
T. Ando, Means of positive linear operators, Math. Ann., 249 (1980) 205- 
224. They establish a one-to-one correspondence between the class of oper- 


ator monotone functions f on [0, 00) with f(1) = 1 and the class of operator 
means. 


Vi 


Spectral Variation of Normal 
Matrices 


Let A be an n x n Hermitian matrix, and let A}(A) > AM (A) >---> AL (A) 
be the eigenvalues of A arranged in decreasing order. In Chapter III we 
saw that dz (A), 1 <j < Nn, are continuous functions on the space of Her- 
mitian matrices. This is a very special consequence of Weyl’s Perturbation 
Theorem: if A, B are two Hermitian matrices, then 


max|;(A) — ;(B)| < |A— Bll 


In turn, this inequality is a special case of the inequality (IV.62), which 


says that if Eig'(A) denotes the diagonal matrix with entries d; (A) down 
its diagonal, then we have 


I|Eig*(A) — Eig!(B)|| < ||A — BI 


for all Hermitian matrices A, B and for all unitarily invariant norms. 

In this chapter we explore how far these results can be carried over to 
normal matrices. The first difficulty we face is that, if the matrices are 
not Hermitian, there is no natural way to order their eigenvalues. So, the 
problem has to be formulated in terms of optimal matchings. Even after 
this has been done, analogues of the inequalities above turn out to be 


a little more complicated. Though several good results are known, many 
await discovery. 
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VI.1 Continuity of Roots of Polynomials 


Every polynomial of degree n with complex coefficients has n complex roots. 
These are unique, except for an ordering. It is thus natural to think of them 
as an unordered n-tuple of complex numbers. The space of such n-tuples is 
denoted by C%,..- This is the quotient space obtained from the space C” 
via the equivalence relation that identifies two n-tuples if their coordinates 
are permutations of each other. The space Cf,,,, thus inherits a natural 
quotient topology from C”. It also has a natural metric: if \ = {A1,..., An } 
and u = {l1,..., Un} are two points in C”. . then 


sym) 


d(A, uw) = min max Aj = Hoy)l: 


where the minimum is taken over all permutations. See Problem II.5.9. 
This metric is called the optimal matching distance between \ and p. 


Exercise VI.1.1 Show that the quotient topology on C”,, and the metric 


sym 
topology generated by the optimal matching distance are identical. 


Recall that, if 


f(z) = 2” — az" 1 + agz™ 7 +--+. 4+ (-1)"a, (VI.1) 
is a monic polynomial with roots a1,...,Q,, then the coefficients a; are 
elementary symmetric polynomials in the variables a1,..., Qn, i.e., 

aj = S- 4,Aing °° Oi, . (VI.2) 


l<iy<--<ij;<n 


By the Fundamental Theorem of Algebra, we have a bijection S: CU, — 
C” defined as 


S({a1,---,Q@n}) = (a1,...,@n). (VI.3) 


Clearly S is continuous, by the definition of the quotient topology. We will 
show that S~! is also continuous. For this we have to show that for every 
€ > 0, there exists 6 > 0 such that if |a; —b;| < 6 for all 7, then the optimal 
matching distance between the roots of the monic polynomials that have a; 
and b, as their coefficients is smaller than e. Let &,...,€ be the distinct 
roots of the monic polynomial f that has coefficients a;. Given € > 0, we 
can choose circles [3,1 <j < k, centred at €;, each having radius smaller 
than € and such that none of them intersects any other. Let [ be the union 
of the boundaries of all these circles. Let 7 = inf \f(z)|. Then 7 > 0. Since 


[' is a compact set, there exists a positive number 6 such that if g is any 
monic polynomial with coefficients b;, and ja; — b;| < 6 for all 7, then 
\f(z) — g(z)| < for all z € I. So, by Rouché’s Theorem f and g have 
the same number of zeroes inside each Ij, where the zeroes are counted 
with multiplicities. Thus we can pair each root of f with a root of g in 
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such a way that the distance between any two pairs is smaller than ¢. In 
other words, the optimal matching distance between the roots of f and g 
is smaller than ¢. We have thus proved the following. 


Theorem VI.1.2 The map S is a homeomorphism between CZ,,,, and C”. 


sym 


The continuity of S~! means that the roots of a polynomial vary contin- 
uously with the coefficients. Since the coefficients of its characteristic poly- 
nomial change continuously with a matrix, it follows that the eigenvalues 
of a matrix also vary continuously. More precisely, the map M(n) — C{,,, 
that takes a matrix to the unordered tuple of its eigenvalues is continu- 
ous. 

A different kind of continuity question is the following. If z - A(z) is a 
continuous map from a domain G in the complex plane into M(n), then 
do there exist n continuous functions A;(z),...,An(z) on G such that for 
each z they are the eigenvalues of the matrix A(z)? The example below 
shows that this is not always the case. 


0 z 
1 O 


+ z1/?, These cannot be represented by two single valued continuous func- 
tions on any domain G that contains zero. 


Example VI.1.3 Let A(z) = ( ). The eigenvalues of A(z) are 


In two special situations, the answer to the question raised above is in 
the affirmative. If either the eigenvalues of A(z) are all real, or if G is an 
interval on the real line, a continuous parametrisation of the eigenvalues of 
A(z) is possible. This is shown below. 


Consider the map from Roym to R” that rearranges an unordered n-tuple 


{A1,-.-,An} in decreasing order as (At,..., Ad). From the majorisation 
relation (11.35) it follows that this map reduces distances, i.e., 


b_utie 
pmax 1A; — Hyl S d(A, H). 


Hence, in particular, this is a continuous map. So, if all the eigenvalues of 
A(z) are real, enumerating them as dN (z) >--- > AL(z) gives a continuous 
parametrisation for them. We should remark that while this is the most 
natural way of ordering real n-tuples, it is not always the most convenient. 
It could destroy the differentiability of these functions, which some other 
ordering might confer on them. For example, on any interval containing 
0 the two functions +t are differentiable. But rearrangement in the way 
above leads to the functions +|t|, which are not differentiable at 0. 
For maps from an interval we have the following. 


Theorem VI.1.4 Let A be a continuous map from an interval I into the 


space Coym- Then there exist n continuous complex functions X;(t) on I 
such that A(t) = {Ai (t),...,An(t)} for each t € I. 
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Proof. For brevity we will call n functions whose existence is asserted by 
the theorem a continuous selection for A. Suppose a continuous selection 
A (t) exists on a subinterval J; and another continuous selection d) (£) 
exists on a subinterval Jz. If I; and Iz have a common point to, then 
{A (to) } and {?) (to) } are identical up to a permutation. So a continuous 
selection exists on I, U Io. 

It follows that, if J is a subinterval of J such that each point of J has 
a neighbourhood on which a continuous selection exists, then a continuous 
selection exists on the entire interval JJ. 

Now we can prove the theorem by induction on n. The statement is 
obviously true for n = 1. Suppose it is true for dimensions smaller than 
n. Let K be the set of all ¢ € I for which all the n elements of A(t) are 
equal. Then K is a closed subset of I. Let L = I\K. Let tp € L. Then 
A(to) has at least two distinct elements. Collect all the copies of one of 
these elements. If these are k in number (i.e., k is the multiplicity of the 
chosen element), then the n elements of A(to) are now divided into two 
groups with k and n — k elements, respectively. These two groups have no 
element in common. Since A(t) is continuous, for t sufficiently close to to 
the elements of A(t) also split into two groups of k and n — k elements, 
each of which is continuous in t. By the induction hypothesis, each of these 
groups has a continuous selection in a neighbourhood of tg. Taken together, 
they provide a continuous selection for A in this neighbourhood. 

So, a continuous selection exists on each component of L. On its comple- 
ment K, A(t) consists of just one element A(t) repeated n times. Putting 
these together we obtain a continuous selection for A(t) on all of J. | 


Corollary VI.1.5 Let a;(t),1 <j < n, be continuous complex valued 
functions defined on an interval I. Then there exist continuous functions 
ay(t),...,Q@n(t) that, for each t € I, constitute the roots of the monic poly- 
nomial 2” — a;(t)z"~+ +---+(-1)"a,(t). 


Corollary VI.1.6 Let t — A(t) be a continuous map from an interval I 
into the space of n x n matrices. Then there exist continuous functions 
Ai(t),...,An(t) that, for each t € I, are the eigenvalues of A(t). 


VI.2 Hermitian and Skew-Hermitian Matrices 


In this section we derive some bounds for the distance between the eigen- 
values of a Hermitian matrix A and those of a skew-Hermitian matrix B. 
This will reveal several new facets of the general problem that are quite 
different from the case when both A, B are Hermitian. 

Let us recall here, once again, the theorem that is the prototype of the 
results we seek. 


156 VI. Spectral Variation of Normal Matrices 


Theorem VI.2.1 (Weyl’s Perturbation Theorem) Let A,B be Hermitian 
matrices with eigenvalues \1(A) >--- > AL(A) and \}(B) >-+- > AL(B), 
respectively. Then 


max|\;(A) — ;(B)| < ||A — Bll. (V1.4) 


We have seen two different proofs of this, one in Section III.2 and the 
other in Section IV.3. It is the latter idea which, in modified forms, will be 
used often in the following paragraphs. 


Theorem VI1.2.2 Let A be a Hermitian and B a skew-Hermitian matriz. 


Let their etgenvalues aj,...,Qn and B1,...,8n be arranged in such a way 
that 
Jax] >--- 2 lan| and [Ai] >--- > |Bnl. (V1.5) 
Then 
max|a; — Byy+1] $A ~ Bll (V1.6) 


Proof. For a fixed index j, consider the eigenspaces of A and B corre- 
sponding to their eigenvalues {a,...,a;} and {(1,...,@n—j+1}, respec- 
tively. Let x be a unit vector in their intersection. Then 


1 
A — Bil? 5 (I|A — Bil’ + |A + Bll?) 


IV 


1 

5 (II(A — B)all’ + ||(A + B)z||?) 

= ||Az||? + | Ball? 

7 la; |? + IBn—j4.l? = |; — Bn—j+1\*. 


At the first step above, we used the equality ||T'| = ||T*|| valid for all T; at 
the third step we used the parallelogram law, and at the last step the fact 
that a; is real and G,_j;41 is imaginary. a 


For Hermitian pairs A, B we have seen analogues of the inequality (VI.4) 
for other unitarily invariant norms. It is, therefore, natural to ask for similar 
kinds of results when A is Hermitian and B skew-Hermitian. 

It is convenient to do this in the following setup. Let T be any matrix 
and let T = A+B be its Cartesian decomposition into real and imaginary 
parts, A = oe and B = a The theorem below gives majorisation 
relations between the eigenvalues of A and B, and the singular values of 
I’. From these several inequalities can be obtained. 

We will use the notation {xj}; to mean an n-vector whose jth coordinate 
is = y° 


Theorem VI1.2.3 Let A,B be Hermitian matrices with eigenvalues a; and 
B;, respectively, ordered so that 


lai] >--->lan| and |B) >---> |B. 
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Let T = A+B, and let s; be the singular values of T. Then the following 
majorisation relations are satisfied: 


{laj + iBn—j4il"}5 < {89}5, (VI.7) 
{1/2 (s§ + 84_s4i)}5 ~ {lay + 18; |7};- (V1.8) 


Proof. For any two Hermitian matrices X,Y we have the majorisations 
(III.13): 


NN (X) + ANY) K A(X + Y) ~AY(X) + ALY). 

Choosing X = A?, Y = B?, this gives 

{loj + %Bn—j+1l°}5 < {8j(A? + B?)}5 ~ {log + 48; |?}5. (V1.9) 
Now note that 

A? + B® = 1/2 (T*T + TT") 
and 
s;(I*T) = s;(TT*) = s*. 
So, choosing X = ir and Y = — in the first majorisation above gives 
1/2 {sé + 8% gaits ~< {s;(A? + B*)}, ~< {s<}j. (VI.10) 

Since majorisation is a transitive relation, the two assertions (VI.7) and 


(V1.8) follow from (VI.9) and (VI.10). a 


For each p > 2, the function y(t) = t?/* is convex on [0,00). So, by 
Corollary II.3.4, we obtain from (VI.7) and (VI.8) the weak majorisations 


{lag + tBn—j4ilP}5 <w {85 }3, (VI.11) 
1 , 
appa (85 + Sng) bs <w {leg + 185? 35. (VI.12) 
These two relations include the inequalities 
doles + Bust? S >) 85: (VI.13) 
j=l j=l 
ig | 
sai S 2 (87 +87, 541)?/? < Solay + 18;/? (VI.14) 
j=1 j=l 


for p > 2. 

If a, and ag are any two nonnegative real numbers, then the function 
g(t) = (at + a5)1/* is monotonically decreasing on 0 < t < ov. So if p 2 2, 
then 

ae + ab < (at + az)P/? (VI.15) 
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Using this we get from (VI.14) the inequality 
gi—-p/2 Ss” si < Sola; + 1B, |? (VI.16) 
j=l j=l 


for p > 2. 


Exercise VI.2.4 For 0 < p < 2, the function y(t) = t?/* is concave on 
[0,00). Use this to show that for these values of p, the weak majorisations 
(VI.11) and (VI.12) are valid with <,, replaced by <”. All the four in- 
equalities (VI.13)-(VI.16) now go in the opposite direction. 


Let A be any matrix with eigenvalues aj,...,@,, counted in any order. 
We have used the notation Eig A to mean a diagonal matrix that has entries 
a; down its diagonal. If a is a permutation, we will use the notation Eig, (A) 
for the diagonal matrix with entries a,(1),..., Qz(n) down its diagonal. The 
symbol Eig!!!(A) will mean the diagonal matrix whose diagonal entries 
are the eigenvalues of A in decreasing order of magnitude, i.e., the a; 
arranged so that |a;| > --- > |a,|. In the same way, Eig!'!(A) will stand 
for the diagonal matrix whose diagonal entries are the eigenvalues of A 
arranged in increasing order of magnitude, i.e., the a; rearranged so that 
lai| < jaz] <--- < Jan|. 

With these notations, we have the following theorem for the distance 
between the eigenvalues of a Hermitian and a skew-Hermitian matrix, in 
the Schatten p-norms. 


Theorem VI.2.5 Let A be a Hermitian and B a skew-Hermitian matric. 
Then, 


(i) for2<p<o, we have 


Eig'!!(A) — Eig!!(B)||, < |A— Bllp, (VI1.17) 
A — Bllp < 227» |[Big!!!(A) — Eig!#|(B)|,,; (VI.18) 
(1) forl1 <p < 2, we have 
\Eig!!(A) — Big!!!(B)|], < 2272] — Blip, (VI.19) 
|A — Bllp < |[Eig!!(A) — Eig!"(B)]],. (VI.20) 


All the inequalities above are sharp. Further, 
(iit) for2<p<o, we have 


|Eig!*!(4) —Eig!t!(B)||, < ||Eig(A) Eig, (B)||» < ||Big!!!(A) —Rig!+|(B) 


lp 


(VI.21) 
for all permutations o; 
(iv) for1<p< 2, we have 
||Big!!(A) —Eig!'!(B)||p < ||Eig(A) — Big, (B)||p < |[Eig!"!(A) —Big!"(B)|, 
(VI.22) 


for all permutations o. 
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Proof. For p > 2, the inequalities (VI.17) and (VI.18) follow immediately 
from (VI.13) and (VI.16). For p = oo, the same inequalities remain valid by 
a limiting argument. The next two inequalities of the theorem follow from 
the fact that, for 1 < p < 2, both of the inequalities (VI.13) and (VI.16) 
are reversed. 

The special case of statements (i) and (ii) in which A and B commute is 
adequate for proving (iii) and (iv). 

The sharpness of all the inequalities can be seen from the 2 x 2 example: 


A=(‘ >) B=( 9 5): (VI1.23) 


Here ||A — Bl|, = 2 for all 1 < p < oo. The eigenvalues of A are +1, those 
of B are +i. Hence, for every permutation o 


||Hig( A) — Eig,(B)||, 
|| A — Bllp 


1 1 
— p32 
for alll <p<o. 


Note that the inequality (VI.6) is included in (VI.17). 

There are several features of these inequalities that are different from 
the corresponding inequality (IV.62) for a pair of Hermitian matrices A, B. 
First, the inequalities (VI.18) and (VI.19) involve a constant term on the 
right that is bigger than 1. Second, the best choice of this term depends 
on the norm || - ||p. Third, the optimal matching of the eigenvalues of A 
with those of B — the one that will minimise the distance between them — 
changes with the norm. In fact, the best pairing for the norms 2 < p < co 
is the worst one for the norms 1 < p < 2, and vice versa. 

All these new features reveal that the spectral variation problem for pairs 
of normal matrices A,B is far more intricate than the one for Hermitian 
pairs. 


Exercise VI.2.6 Let A be a Hermitian and B a skew-Hermitian matric. 
Show that for every unitarily invariant norm we have 


|Eig!!(A) — Big!!(B)|l| < 2IA — Bll, (VI.24) 


|| A — Bi < v2||Big!*!(A) — Eig!!(B)II. (VI.25) 


The term V2 in the second inequality cannot be replaced by anything smaller. 


V1.3 Estimates in the Operator Norm 


In this section we will obtain estimates of the distance between the eigen- 
values of two normal matrices A and B in terms of ||A — B||. Apart from 
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the optimal matching distance, which has already been introduced, we will 
consider other distances. 
If L, M are two closed subsets of the complex plane C, let 


s(L,M) = sup dist(A, M) = sup inf |A — pl. (VI.26) 
AECL JEL HEM 


The Hausdorff distance between L and M is defined as 
h(L, M) = max(s(L,M), s(M,L)). (VI.27) 


Exercise V1.3.1 Show that s(L,M) =0 if and only if L is a subset of M. 
Show that the Hausdorff distance defines a metric on the collection of all 
closed subsets of C. 


Note that s(L,M) is the smallest number 6 such that every element of 
L is within a distance 6 of some element of M; and h(L, M) is the smallest 
number 6 for which this, as well as the symmetric assertion with L and M 
interchanged, is true. 

Let {A1,..-,An} and {41,..., fn} be two unordered n-tuples of complex 
numbers. Let L and M be the subsets of C whose elements are the entries of 
these two tuples. If some entry among {A;} or {u;} has multiplicity bigger 
than 1, then the cardinality of L or M is smaller than n. 


Exercise VI.3.2 (i) The Hausdorff distance h(L,M) is always less than 
or equal to the optimal matching distance d({\1,-..,An}, {pn,--- ,Ln}). 


(11) When n = 2, the two distances are equal. 


(ttt) The triples {0,m—e, m+e} and {m,€,—€} provide an example in 
which h(L,M) = « and the optimal matching distance is m — 2c. Thus, for 
n = 3, the second distance can be arbitrarily larger than the first. 


If A is ann xn matrix, we will use the notation o(A) for both the subset 
of the complex plane that consists of all the eigenvalues of A, and for the un- 
ordered n-tuple whose entries are the eigenvalues of A counted with multi- 
plicity. Since we will be talking of the distances s(a(A), 0(B)), h(o(A), o(B)), 


and d(a(A),o(B)), it will be clear which of the two objects is being repre- 
sented by o(A). 


Note that the inequalities (VI.4) and (VI.6) say that 
d(o(A),o(B)) < ||A— Bl, (VI.28) 


if either A and B are both Hermitian, or one is Hermitian and the other 
skew-Hermitian. 


Theorem VI.3.3 Let A be a normal and B an arbitrary matrix. Then 


s(o(B),0(A)) < ||A — BI]. (VI.29) 
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Proof. Let ¢ = ||A — B||. We have to show that if @ is any eigenvalue of 
B, then ( is within a distance e€ of some eigenvalue a, of A. 

By applying a translation, we may assume that @ = 0. If none of the a; 
is within a distance e of this, then A is invertible. Since A is normal, we 
have ||A7~?|| = manos] < +. Hence, 


|A-*(B = A) < |A“*]] |B - All <1. 


Since B = A(I + A~'(B — A)), this shows that B is invertible. But then 
B could not have had a zero eigenvalue. a 


Another proof of this theorem goes as follows. Let A have the spectral 


resolution A = Laju;uz, and let v be a unit vector such that Bu = Gv. 
Then 


|A — BI’ 


IV 


(A — B)ol|? = || SCasuzuus — BY Cusou; ||? 
j j 


doles — BP lusu/?. 


j 


Since the u; form an orthonormal basis, Sluryl? = 1. Hence, the above 


j 
inequality can be satisfied only if Ja; — B|? < ||A — B||? for at least one 
index 7. 


Corollary VI.3.4 If A and B aren xn normal matrices, then 


h(o(A),o(B)) < ||A— Bl. - (VI-80) 


For n = 2, we have 
d(o(A),o(B)) < |A- BI. (VI.31) 


This corollary also follows from the proposition below. 
We will use the notation D(a, p) for the open disk of radius p centred at 
a, and D(a, p) for the closure of this disk. 


Proposition V1.3.5 Let A and B be normal matrices, and let € = 
|| A — Bll. If any disk D(a, p) contains k eigenvalues of A, then the disk 
D(a,p +e) contains at least k eigenvalues of B. 


Proof. Without loss of generality, we may assume that a = 0. Suppose 
D(0, p) contains k eigenvalues of A but D(0,p+ €) contains less than k 
eigenvalues of B. Choose a unit vector z in the intersection of the eigenspace 
of A corresponding to its eigenvalues lying inside D(0, p) and the eigenspace 
of B corresponding to its eigenvalues lying outside D(0, p+e). We then have 
|| Az|| < p and ||Bz|| > p +e. We also have ||Bz|| — ||Az|| < ||(B — A)z|| < 
||B — A|| =. This is a contradiction. a 
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Exercise VI.3.6 Use the special case p = 0 of Proposition VI.3.5 to prove 
Corollary VI.8.4. 


Given a subset X of the complex plane and a matrix A, let ma(X) 
denote the number of eigenvalues of A inside X. 


Exercise VI.3.7 Let A,B be two n x n normal matrices. Let K,, K2 be 
two conver sets such that ma(Ky) < k and mgp(Ke) > n-—k+1. Then 
dist(K1, K2) < ||A — B||. /Hint: Let p — oo in Proposition VI.3.5.] 


Exercise VI.3.8 Use this to give another proof of Theorem VI.2.1. 


Exercise VI.3.9 Let A, B be two nxn unitary matrices whose eigenvalues 
lie in a semicircle of the unit circle. Label both the sets of eigenvalues in 
the counterclockwise order. Then 


max|2j(A) ~ Ay(B)| < || ~ Bll. 


Hence, 


d(o(A),o(B)) < ||A- BI. 


Exercise VI.3.10 Let T be the unit circle, I any closed arc in T, and for 
E> 0 let I, be the arc {z €T: |z—w| <e for some w € I}. Let A,B be 
unitary matrices with ||A — B|| =e. Show that ma(I) < mg(I-). 


Theorem VI.3.11 For any two unitary matrices, 


d(o(A), o(B)) < ||A— BI]. 


Proof. The proof will use the Marriage Theorem (Theorem II.2.1) and 
the exercise above. 

Let {A1,.-.,An} and {111,..., Un} be the eigenvalues of A and B, respec- 
tively. Let A be any subset of {Ai,...,An}. Let w(A) = {p,; : lly ~ As] Se 
for some A; € A}. By the Marriage Theorem, the assertion would be proved 
if we show that |(A)| > |AJ. 

Let I(A) be the set of all points on the unit circle T that are within 
distance € of some point of A. Then p(A) contains exactly those yu; that 
lie in J(A). Let I(A) be written as a disjoint union of arcs [,,... ,1,. For 
each 1 < k <r, let J, be the arc contained in J x all whose points are at 
distance > € from the boundary of I,. Then I, = (Jk )e- 

From Exercise VI.3.10 we have 


> ma(Je) < Soma (Ig) = ma (I(A)). 
k=1 k=1 
But, all the elements of A are in some J;,. This shows that |A| < Ju(A)|. ff 


There is one difference between Theorem VI.3.11 and most of our earlier 
results of this type. Now nothing is said about the order in which the 
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eigenvalues of A and B are arranged for the optimal matching. No canonical 
order can be prescribed in general. In Problem VI.8.3, we outline another 
proof of Theorem VI.3.11 which says, in effect, that for optimal matching 
the eigenvalues of A and B can be counted in the cyclic order on the circle 
provided the initial point is chosen properly. The catch is that this initial 
point depends on A and B and we do not know how to find it. 


Exercise VI.3.12 Let A=c,U,, B = cQU2, where U1, Us are unitary ma- 


trices and c,c2 are complex numbers. Show that d(a(A),o(B)) 
< |A— Bi. 


By now we have seen that the inequality (VI.28) is valid in the following 
situations: 


(i) A and B both Hermitian 
(ii) A Hermitian and B skew-Hermitian 
) 


(iii) A and B both unitary (or both scalar multiples of unitaries) 


(iv) A and B both 2 x 2 normal matrices. 


The example below shows that this inequality breaks down for arbitrary 
normal matrices A, B when n > 3. 


Example VI.3.13 Let A be the 3x3 diagonal matriz with diagonal entries 
1 = 1, AQ = A45V3 i A3 = =Li2v3 i Let vt = (/8.4, i) and let 
U =I-—2vv?. Then U is a unitary matrix. Let B= —U* AU. Then B is a 


normal matriz with eigenvalues uu; = —A;, 7 = 1,2,3. One can check that 
28 27 
= 4/— — Bil =4/—. 
a(o(A), o(B)) =f, |A-Bl= 1/5 
So, 


d(o(A), o(B)) 
|A — Bl 
In the next chapter we will show that there exists a constant c < 2.91 
such that for any two n x n normal matrices A, B 


= 1,0183*. 


d(a(A),o(B)) < ¢l|A— BI. 
For Hermitian matrices A, B we have a reverse inequality: 


1 T 
|A— Bll < ymax |; (A) — X; (B)|. 


The quantity on the right is the distance between the eigenvalues of A 
and those of B when the “worst” pairing is made. An analogous result for 
normal matrices is proved below. 
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Theorem VI.3.14 Let A and B be normal matrices with eigenvalues 
{\1,.--,;An} and {4,..-,Mn}, respectively. Then, there exists a permu- 
tation o such that 


A — Bl] < V2 max |dj — Hots). (VI.32) 
1<jgn 


Proof. The matrices A®/J and [@B are both normal and commute with 
each other. Hence A @ J — I @ B is normal. The eigenvalues of this matrix 
are all the differences A; — ;,1 < 7,7 <n. Hence 


|A@I-I@B|= max|A; — |. 


So, the inequality (VI.32) is equivalent to 
|A- Bll < V2 ||A@I-I@ BI. 
This is, in fact, true for all A, B and is proved below. a 


Theorem VI.3.15 For all matrices A, B 
|A- Bl < V2|A@I-I@ BI. (VI1.33) 


Proof. We have to prove that for all x,y in C” 


(x, (A — B)y)| < V2|A@I-I@ BT || xl} ly!) 
We have 


(z,(A— B)y)| = |x*Ay — x* By| = |tr(Ayx* — yx* B)| 
<  ||Ayx* — yx" Bl|y. 
The matrix Ayz* — yx*B has rank at most 2. So, 
| Ayz* — yx* Bil, < V2\|Ayz* — yx* Blo. 
Let Z be the vector whose components are the complex conjugates of the 
components of z. Then with respect to the standard basis e;®e,; of C°@C”, 


the (7, 7)-coordinate of the vector (A@I)(y@Z) is S> Ainyr®;. This is also 


k 
the (2,7)-entry of the matrix Ayz*. In the same way, the (7,7)-entry of 


yx" B is the (7, 7)-coordinate of the vector (I @ B")(y @ Z). Thus, we have 


|Ayz" —yx*Bll2 = |\(A®I-1@ BT)(y@2)| 
< |A®I-I@B*| yor 
= |A@I-I@B*| |x| |lyl. 
This proves the theorem. a 


The example (VI.23) shows that the inequality (VI.32) is sharp. Note 
that in this example A and B are both unitary. Also, A is Hermitian and 
B is skew-Hermitian. In contrast, the factor V2 in (VI.32) can be replaced 
by 1 if A, B are both Hermitian. 
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We will use the symbol S,, to mean the set of permutations on n symbols, 
as well as the set of n x n permutation matrices. (To every permutation o 
there corresponds a unique matrix P that has entries 1 in the (7,7) place if 
and only if 7 = o(i), and all whose remaining entries are zero.) Let 2, be 
the set of all n xn doubly stochastic matrices. This is a convex polytope and 


by Birkhoff’s Theorem (Theorem II.2.3) its extreme points are permutation 
matrices. 


Theorem VI.4.1 (Hoffman-Wielandt) Let A and B be normal matrices 
with eigenvalues {1,...,An} and {,14,..., Ln}, respectively. Then 


n 1/2 n 1/2 

1 - . 2 —_— _ — . 2 
min (sn Ho (i)| <||A— Bll2< max (son Has)! 
(VI.34) 


D2, where D; = diag(A1,...,An) and Dz = diag(j11,.-., Un). Then, by uni- 
tary invariance of the Frobenius norm, ||A— B\|3 = ||U*D,U —V*D2V||2 = 
||D1W — WDp||*, where W = UV*, another unitary matrix. If the matrix 
W has entries w;;, this can be written as 


A — Biz = Sola — poy? wigl?. 


24) 


Proof. Choose unitary matrices U,V such that UAU* = D,, VBV* = 


The matrix (|w;;|*) is doubly stochastic. The map (213) > Soli — 5 |" ri; 


i,J 
is an affine function on the set 2, of doubly stochastic matrices. So it 
attains its minimum at one of the extreme points of 2,,. Thus, there exists 
a permutation matrix (p;;) such that 


A — BIZ > SOG — wy? 055: 
i,j 


If this matrix corresponds to the permutation o, this says that 


|A — Biz > So) — pPea|?. 


This proves the first inequality in (VI.34). The same argument for the 
maximum instead of the minimum gives the other inequality. a 


Note that for Hermitian matrices, the inequality (V1.34) was proved ear- 
lier in Problem III.6.15. In this case, we also proved that the same inequality 
is true for all unitarily invariant norms. 
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In general, there is no prescription for finding the permutation o that 
minimises the Euclidean distance between the eigenvalues of A and those 
of B. However, if A is Hermitian with eigenvalues enumerated as A; > 
Ag > ++: > An, then an enumeration of uw; in which Re uw; > Re pe > 
--» > Re pz, is the best one. To see this, just note that if A; > Azg and Re 
[ty = Re pa, then 


Ar — wal? + |A2 — pal? < |Ar — pel? + |A2 — wal’. 


The same argument shows that an enumeration for which the maximum 
distance is attained is one for which Re pp; < Re po <---< Re pn. (What 
does this say when B is skew-Hermitian?) 

Using the notations introduced in Section VI.2, the inequality (VI.34) 
can be rewritten as 


min||Big(A) Big, (B)||2 < ||A—Bil2z < max(|Big(A)—Big, (B)|l>. (VI35) 


There is another way of looking at this. Since the eigenvalues of a normal 
matrix completely determine the matrix up to a unitary conjugation, the 


inequality (VI.35) is equivalent to saying that for any two diagonal matrices 
A,B 


min||A — PBP*||z < ||A—UBU*|2 < max||A—PBP*|l2, — (V1.36) 


where U is any unitary matrix and P varies over all permutation matrices. 
Given any matrix B, let Ug be the set 


Up = {UBU* :U € U(n)}, 


where U(n) is the group consisting of unitary matrices. Then Uz is a 
compact set called the unitary orbit of B. For a fixed diagonal matrix A, 
consider the function f(X) = ||A — X||2. The inequality (VI.36) then says 
that if B is another diagonal matrix, then on the compact set Up both the 
minimum and the maximum of f are attained at diagonal matrices (just 
some permutations of B). In other words, the minimum and the maximum 
on the unitary orbit are both contained in the permutation orbit. 

This is an interesting fact from the point of view of calculus and geom- 
etry. We will see below that if A, B are real diagonal matrices, a stronger 
statement can be proved using calculus. This will also serve to introduce 
some elementary ideas of differential geometry used in later sections. 

A differentiable function U(t), where t is real and U(t) is unitary, is called 
a differentiable curve through J if U(0) = I. Differentiating the equation 
U(t)U(t)* = J at t = 0 shows that for such a curve U’(0) is skew-Hermitian. 
The matrix U"(0) is called the tangent vector to U(t) at I. If K = U'(0), 
then e’* is another differentiable curve through I with tangent vector K 
at I. Thus, the curves U(t) and e’* have the same tangent vector and so 
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represent the same curve locally, i.e., they are equal to the first degree of 
approximation. The tangent space to the manifold U(n) at the point J is 
the linear space that consists of all these tangent vectors. We have seen that 
this is the real vector space K(n) consisting of all skew-Hermitian matrices. 

If U4 is the unitary orbit of a matrix A, then every differentiable curve 
through A can be represented locally as e’ Ae~**® for some skew-Hermitian 
i. The derivative of this curve at t = 0 is KA—AK. This is usually written 
as |K, A] and called a Lie bracket or a commutator. Thus the tangent 
space to the manifold U4 at the point A is the space 


Tal, = {[A, K]: K € K(n)}. (V1.37) 


Note that this implies that T’4l/, is contained in K(n) if A € K(n). 
The sesquilinear form (A, B) = tr A*B is an inner product on the space 


M(n). The symbol S+ will mean the orthogonal complement of a space S$ 
with respect to this inner product. 


Lemma VI.4.2 For every A € K(n), the orthogonal complement of Tala 
in K(n) is the set of all Y that commute with A. 


Proof. Let Y € K(n). Then Y € (T4U,)~* if and only if for every K in 
K(n) we have 


0 = (Y,[A,K]) =trY*(AK — KA) 
= ~tr(YAK —YKA) =tr[A,Y]K. 


This is possible if and only if [A, Y] = 0. a 


The set of all matrices Y that commute with A is called the commutant 
or the centraliser of A, and is denoted as Z(A). The lemma above says 
that in the space K(n), (T4U/4)~ = Z(A) for every A. 


Theorem VI.4.3 Let A € K(n) and let f(X) = ||A — X|lo. Let B be any 
other element of K(n). Then Bo is an extreme point for the function f on 
the unitary orbit Ug if and only if Bo commutes with A. 


Proof. A point Bo is an extreme point if and only if the straight line 
joining A and Bo is perpendicular to Ug at Bo. By Lemma VI.4.2 this is 
so if and only if A— Bp commutes with Bo, i.e., if and only if A commutes 
with Bo. a 


For skew-Hermitian (or Hermitian) matrices A,B, this gives another 
proof of Theorem VI.4.1. However, in this case Theorem VI.4.3 says much 
more. From the first theorem we can conclude that if A and B are normal, 
then the global minimum and maximum of the (Frobenius) distance from 
A to Up are attained among matrices that commute with A. The second 
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theorem says that when A and B are both Hermitian this is true for all 
local extrema as well. . 

This last statement is not true when A is Hermitian and B is skew- 
Hermitian. For in this case, 


|A — UBU*|3 = ||Alla + ||UBU* ll = llAll2 + || Bll 


for all U. Thus the entire orbit U/g is at a constant distance from A. Hence, 
every point on U/g is an extremal point. However, not every point on Ug 
need commute with A. 


VI.5 Geometry and Spectral Variation: the 
Operator Norm 


The first theorem below says that if A is a normal matrix and B is any 
matrix close to A, then the optimal matching distance d(o(A),o(B)) is 
bounded by ||A — B||. This is a local phenomenon; global versions of this 
are what we seek in the next paragraphs. 


Theorem VI.5.1 Let A be a normal matriz, and let B be any matrix such 
that ||A — B\| ts smaller than half the distance between any two distinct 
eigenvalues of A. Then d(o(A),o(B)) < ||A— Bll. 


Proof. Let a1,...,a,% be all the distinct eigenvalues of A. Let « = 
|| A — B||. By Theorem VI.3.3, all the eigenvalues of B lie in the union of 
the disks D(a;,¢). By the hypothesis, these disks are mutually disjoint. We 
claim that if the eigenvalue a; has multiplicity m,;, then the disk D(a;,€) 
contains exactly m, eigenvalues of B, counted with their respective multi- 
plicities. Once this is established, the statement of the theorem is seen to 
follow easily. 

Let A(t) = (1—t)A+tB, 0 < t < 1. This is a continuous map from 
[0, 1] into the space of matrices; and we have A(0) = A, A(1) = B. Note 
that ||A — A(t)|| = te, and so all the eigenvalues of A(t) also lie in the disks 
D(a;,€) for each 0 < t < 1. By Corollary VI.1.6, as t moves from 0 to 1, 
the eigenvalues of A(t) trace continuous curves that join the eigenvalues 
of A to those of B. None of these curves can jump from one of the disks 
D(a;,€) to another. So, if we start off with m, such curves in the disk 


D(aj;,€), we must end up with exactly as many. = 


Example VI.3.13 shows that if no condition is imposed on B, then the 
conclusion of the theorem above is no longer valid, even when B is normal. 
However, this does suggest a new approach to the problem. Let A, B be 
normal matrices, and let y(t) be a curve joining A and B, such that each 
y(t) is a normal matrix. Then in a small neighbourhood of ¥(t) the spectral 
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variation inequality of Theorem VI.5.1 holds. So, the (total) spectral vari- 
ation between the endpoints of the curve must be bounded by the length 
of this curve. This idea is made precise below. 

Let N denote the set of normal matrices of a fixed size n. If A is an 
element of N, then so is tA for all real t. Thus the set N is path connected. 
However, N is not an affine set. 

A continuous map y from any interval [a,b] into N will be called a 
normal path or a normal curve. If y(a) = A and y(b) = B, we say 
that y is a path joining A and B; A and B are then the endpoints of +. 


The length of +, with respect to the norm || - ||, is defined as 
m—1 
£y.4(7) =sup S~ |ly(te+1) — v(te)ll, (VI.38) 
k=0 


where the supremum is taken over all partitions of [a,blasa=to <t, < 
‘++ <tm = 0. If this length is finite, the path + is said to be rectifiable. If 
the function y is a piecewise C! function, then 


b 
eur) = f lly Ola (V1L39) 


Theorem VI.5.2 Let A and B be normal matrices, and let y be a rectifi- 
able normal path joining them. Then 


d(a(A), o(B)) < &.y (7). (VI.40) 


Proof. For convenience, let us choose the parameter t to vary in (0, 1]. 
For 0 <r < 1, let 7, be that part of the curve which is parametrised by 
[0, r]. Let 

G = {re [0,1]: d(o(A), o((7))) < fy.) }- 
The theorem will be proved if we show that the point 1 is in G. 

Since the function 7, the arclength, and the distance d are all continuous 
in their arguments, the set G is closed. So it contains the point g = sup G. 
We have to show that g = l. 

Suppose g < 1. Let S = y(g). Using Theorem VI.5.1, we can find a point 
t in (g, 1} such that, if T = y(t), then d(o(S),o(T)) < ||S — T||. But then 


d(o(A), o(y(t))) < ad(o(A), o(S)) + d(o(S), o(T)) 
<4. (%) + 15 — TI 
< Ly. (9) 
By the definition of g, this is not possible. = 


An effective estimate of d(o(A),o(B)) can thus be obtained if one could 
find the length of the shortest normal path joining A and B. This is a 
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difficult problem since the geometry of the set N is poorly understood. 
However, the theorem above does have several interesting consequences. 


Exercise V1.5.3 Let A,B € N. Then the line segment joining A and B 
lies in N if and only if A— B is inN. 


Theorem VI.5.4 Jf A,B are normal matrices such that A — B is also 
normal, then d(a(A),a(B)) < ||A— Bl. 


Proof. The path ¥ consisting of the line segment joining A, B is a normal 
path by Exercise VI.5.3. Its length is || A — Bl]. a 


For Hermitian matrices A, B, the condition of the theorem above is sat- 
isfied. So this theorem includes Weyl’s perturbation theorem as a special 
case. 

A more substantial application of Theorem VI.5.2 is obtained as follows. 
It turns out that there exist normal matrices A,B for which A — B is not 
normal, but there exists a normal path that joins them and has length 
|| A — Bl]. Note that this path cannot be the line segment joining A and 
B; however, it has the same length as the line segment. What makes this 
possible is the fact that the metric under consideration is not Euclidean, and 
so geodesics need not always be straight lines. (Of course, by the definition 
of the arclength and the triangle inequality no path joining A, B could have 
length smaller than ||A — Bj.) 

Let S be any subset of M(n). We will say that S is metrically flat in 
the metric induced by the norm || - || if any two points A,B of S can be 
joined by a path that lies entirely within S and has length ||A — Bl]. To 
emphasize the dependence on the norm || - ||, we will also call such a set 
|| - ||-flat. 

Of course, every affine set is metrically flat. A nontrivial example of a 
|| - ||-flat set is given by the theorem below. Let U be the set of nxn unitary 
matrices and C- U the set of all constant multiples of unitary matrices. 


Theorem VI.5.5 The set C- U is || - ||-flat. 


Proof. First note that C - U consists of just nonnegative real multiples 
of unitary matrices. Let Ag = roQUp and A, = r,U, be any two elements 


of this set, where r9,r; > 0. Choose an orthonormal basis in which the 
unitary matrix U,U, ‘ is diagonal: 


U\U, * = diag(e’™,..., e%), 


where 
On| << Al <x. 


Reduction to such a form can be achieved by a unitary conjugation. Such 
a process changes neither eigenvalues nor norms. So, we may assume that 
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all matrices are written with respect to the above orthonormal basis. Let 
K = diag(i0,,...,70,). 


Then, K is a skew-Hermitian matrix whose eigenvalues are in the interval 
(—iz, im]. We have 


\| Ao — A\|| lrol — 71U,U, "|| = max|ro —Ty exp(20; )| 


IT — Ty exp(26; )]. 


This last quantity is the length of the straight line joining the points ro 
and r, exp(20,) in the complex plane. Parametrise this line segment as 
r(t)exp(it0,),0 < t < 1. This can be done except when \0,| = 7, an 
exceptional case to which we will return later. The equation above can 
then be written as 


|| Ao — Ail 


/ | [r(t) exp(ét0,))' |dt 
0 


/ ir'(t) + r(t) iO, |e. 
0 


Now let A(t) = r(t)exp(tK)Uo,0 < t < 1. This is a smooth curve in 
C-U with endpoints Ag and A,. The length of this curve is 


/ |A'()|lat 
O 


/ II7’(t) exp(tK)Up + r(t)K exp(tK )Up||dt 
0 


[ir@r+roxniar 
0 


since exp(tK )Up is a unitary matrix. But 


Ir (tl +r(t)Kl| = max|r'(t) + ir(t)0;| = |r’(t) + er(t)64|. 


Putting the last three equations together, we see that the path A(t) joining 
Ap and A, has length || Ag — Aj||. 

The exceptional case |@;| = a is much simpler. The piecewise linear path 
that joins Ag to 0 and then to A; has length ro + 71. This is equal to 
Iro — 71 exp(20;)| and hence to ||Ap — Ai|l. a 


Using Theorems VI.5.2 and VI.5.5, we obtain another proof of the result 
of Exercise VI.3.12. 
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Exercise VI.5.6 Let A,B be normal matrices whose eigenvalues lie on 
concentric circles C(A) and C(B), respectively. Show that d(a(A), o(B)) < 
||A — BI. 


Theorem VI.5.7 The set N consisting of nxn normal matrices is |\-||-flat 
if and only ifn < 2. 


Proof. Let A,B be 2 x 2 normal matrices. If the eigenvalues of A and 
those of B lie on two parallel lines, we may assume that these lines are 
parallel to the real axis. Then the skew-Hermitian part of A—B is a scalar, 
and hence A — B is normal. The straight line joining A and B, then lies 
in N. If the eigenvalues do not lie on parallel lines, then they lie on two 
concentric circles. If a is the common centre of these circles, then A and B 
are in the set a+ C-U. This set is |] - ||-flat. Thus, in either case, A and B 
can be joined by a normal path of length ||A — Bl]. 

If n > 3, then N cannot be || - ||-flat because of Theorem VI.5.2 and 
Example VI.3.13. wn 


Example VI.5.8 Here is an example of a Hermitian matriz A and a skew- 
Hermitian matrix B that cannot be joined by a normal path of length ||A— 
B\|. Let 


0 1 0 0 1 0O 
A={1 0 1 B={-1 0 1 
0 1 0 0 -1 0 


Then ||A—B|| = 2. If there were a normal path of length 2 joining A, B, then 
the midpoint of this path would be a normal matriz C such that || A—C\|| = 
||B — C|| = 1. Since each entry of a matriz is dominated by its norm, this 
impltes that |co1 — 1| < 1 and |co; + 1| < 1. Hence cg, = 0. By the same 
argument, c32 = 0. So 


x x 
A-C= 1 x 
x I 


where * represents an entry whose value is not yet known. But if |A-—C| = 
1, we must have 


Hence 


But then C' could not have been normal. 
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VI.6 Geometry and Spectral Variation: wui 
Norms 


In this section we consider the possibility of extending to all (weakly) uni- 
tarily invariant norms, results obtained in the previous section for the op- 
erator norm. Given a wui norm 7, the 7r-optimal matching distance 
between the eigenvalues of two (normal) matrices A, B is defined as 


d,(o(A),o(B)) = min 7(Big A — P(Eig B)P~*), (VI.41) 


where, as before, Eig A is a diagonal matrix with eigenvalues of A down its 
diagonal (in any order) and where P varies over all permutation matrices. 
We want to compare this with the distance r(A — B). The main result in 
this section is an extension of the path inequality in Theorem VI.5.2 to all 
wul norms. From this several interesting conclusions can be drawn. 

Let us begin by an example that illustrates that not all results for the 
operator norm have straightforward extensions. 


Example VI.6.1 For 0 <t <7, let U(t) = ( au , ) . Then, 


i . t 
IU) — VOI = [1 —e*4| = 2 sin 5, 


for every unitarily invariant norm. In the trace norm (the Schatten 1- 
norm), we have 


di(o(U(t)), o(U(0))) = 2[1 — e#*/2| = 4 sin ; 


So, 
d,(a(U(t)), o(U(0))) t 
—___.—___“~ =sec— > 1, for t #0. 
|U(t) — U(O)Ih1 d 
Thus, we might have d(o(A),a(B)) > ||A—Bll1, even for arbitrarily close 
normal matrices A,B. Compare this with Theorems VI.5.1 and VI.4.1. 


The @-norms are special in this respect, as we will see below. 

Let ® be any finite subset of C. A map F’: C — @ is called a retraction 
onto ® if |z— F(z)| = dist(z, ®), i.e., F maps every point in C to one of the 
points in ® that is at the least distance from it. Such an F is not unique if 
® has more than one element. 

Let ® be a subset of C that has at most n elements, and let N(®) be the 
set of all n x n normal matrices A such that o(A) C ®. If F is a retraction 
onto ®, then for every normal matrix B with eigenvalues {G,,...,(G,} and 
for every A in N(®) we have 


|B—F(B)|| = imax |G; — F(G;)| = max dist(G;, ®) 


(VI.42) 
= s(0(B),o(A)) <||B-Al 


174 VI. Spectral Variation of Normal Matrices 


by Theorem VI.3.3. Note that the normality of B was required at the first 
step and that of A at the last. This inequality has a generalisation. 


Theorem VI.6.2 Let ® be a finite set of cardinality at most n. Let F be a 
retraction onto ®. Then for every normal matriz B and for every A € N(®) 
we have 


|B — F(B)\lq < ||B— Alle (VI.43) 


for every Q-norm. 


Proof. By Exercise IV.2.10, the inequality (VI.43) is equivalent to the 
weak majorisation 


[s(B — F(B))]’ <w [s(A — B)]’. 


If G,,..., Bn are the eigenvalues of B, this is equivalent to saying that for 
all 1<k<n we have 


k k 


for every choice of indices 71,...,2. 
By Ky Fan’s maximum principle (Exercise II.1.13) 


k k 
S°s3(A — B) = max S-||(A— B)v||”, 
j=l 


j=l 


where the maximum is taken over all orthonormal k-tuples v,,...,v,. In 
particular, if e; are unit vectors such that Be; = G;e,;, then 


k k 
dUs(A —~ B) > So \\(A- 6, es, |?. 


j=1 


But if @ is any complex number and e any unit vector, then ||(A — 8)e|| > 
dist(G,o(A)). (See the second proof of Theorem VI.3.3.) Hence, we have 


k k 
dsi(A- B)2 216. — F(Gi;)|? y 


and this completes the proof. | 


Exercise VI.6.3 Show that the assertion of Theorem VI.6.2 is not true 
for the Schatten p-norms, 1 <p < 2. (See the example in (VI.23).) 
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Corollary V1.6.4 Let A be a normal matriz, and let B be another normal 
matriz such that ||A — B|| is smaller than half the distance between the 
distinct eigenvalues of A. Then 


da(o(A),o(B)) < ||A— Bla 


for every Q-norm. (The quantity on the left is the Q-norm optimal match- 
ing distance. ) 


Proof. Let ¢ = ||A — B\|. In the proof of Theorem VI.5.1 we saw that 
all the eigenvalues of B lie within the region comprising of disks of radius 
€ around the eigenvalues of A. Further, each such disk contains as many 
eigenvalues of A as of B (multiplicities counted). The retraction F of The- 


orem VI.6.2 then achieves a one-to-one pairing of the eigenvalues of A and 
those of B. | 


Replacing the operator norm by any other norm 7 in (VI.38), we can . 
define the 7-length of a path y by the same formula. Denote this by £,(7) 


Exercise VI.6.5 Let A and B be normal matrices, and let y be a normal 
path joining them. Then for every Q-norm we have 


da(o(A),o(B)) < (7). 
This includes Theorem VI.5.2 as a special case. 
We will now extend this inequality to its broadest context. 


Proposition VI.6.6 Let A be a normal matriz and let 6 be half the min- 
amum distance between distinct eigenvalues of A. Then there exists a posi- 
tive number M (depending on 6 and the dimension n) such that any normal 
matrit B with ||A — B\| < 6 has a representation B = UB'U*, where B’ 
commutes with A and U is a unitary matric with || —U|| < M||A — Bj. 


Proof. Let a;,1 <j <r, be the distinct eigenvalues of A, and let m,; be 
the multiplicity of a;. Choose an orthonormal basis in which A = @,a,J; 
where J;,1 <j <r, are identity submatrices of dimensions m,;. By the 
argument used in the proof of Theorem VI.5.1, the eigenvalues of B can be 
grouped into diagonal blocks D;, where D; has dimension m,; and every 
eigenvalue of D; is within distance 6 of a;. This implies that 


I(ajlk— De) "<= it 7k 


If D = ©; D;, then there exists a unitary matrix W such that B= WDW". 
With respect to the above splitting, let W have the block decomposition 
W = [Wx], 1 <9,k <r. Then 
|A— Bl |A-WDW*| = |AW -WD| 
= ||[Wjx(ajle — De) II- 
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Hence, for 7 # k, 
1 
Wiel < agte — De)“ A- BI s FIA - BI. 
Hence, there exists a constant K that depends only on 6 and n such that 
|W — @;Wj,|| < K||A— BI. 
Let X = @,W,,;. This is the diagonal part in the block decomposition of 
W. Hence, ||X|| < 1 by the pinching inequality. Let W;; = V;P; be the 
polar decomposition of W,;,; with V; unitary and P; positive. Then 
|W55 — Vall =P) — Gil <P? - Gil, 


since P; is a contraction. Let V = @;V;. Then V is unitary and from the 
above inequality, we see that 


|X —V]] < |X*X — I = |X" x —-W*w)I. 
Hence, 


|W —Vv| |W — XI] + xX —V]] < |W - X]] + |x" x —- ww 
|W — XI] + |X" — W*)X]| + W(x — w)|| 


3||W — X|| < 3K||A— BI). 


IN IA IA 


If we put U = WV* and M = 3K, we have ||I — U|| < M||A— B]| and 
B= WDW* = UVDV*U* = UB'U*, where B’ = VDV*. Since B’ is 
block-diagonal with diagonal blocks of size m,;, it commutes with A. This 
completes the proof. a 


Proposition VI.6.7 Given a normal matrix A, a wui norm T and an 


é > 0, there exists a small neighbourhood of A such that for any normal 
matric B in this neighbourhood we have 


d,(a(A),o(B)) < (1+ ¢e)r(A-— B). 
Proof. Choose B so close to A that the conditions of Proposition VI.6.6 
are satisfied. Let U, B’, M be as in that proposition. 
Let S =U —I,so that U =I4+ S and U* =I —S$+ $?U*. Then 
A-—B=A-—B'+[B’,S|+UB’SU* — B'S. 


Hence, 


T(A — B’+[B’,S]) < 7(A— B)+7(UB’SU* — B'S). 
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Since A and B’ are commuting normal matrices, they can be diagonalised 
simultaneously by a unitary conjugation. In this basis the diagonal of [B’, S 
will be zero. So, by the pinching inequality 


T(A— B’) <r(A— B’ +[B’,S}). 
The two inequalities above give 
d-(a(A),o(B)) < r(A— B) + 7(UB'SU* — B'S). 
Now choose & such that 


1 _ 7(X) 
=—< 
k~ |X| 


<k for all X. 


Then, using Proposition VI.6.6, we get 


T(UB'SU* — B'S) < 2r(B’S) < 2kM||B\| ||A — Bl 
< 2k*M||Bl|7(A — B). 


Now, if B is so close to A that we have 2k?M||B|| < e, then the inequality 
of the proposition is valid. a 


Theorem VI.6.8 Let A,B be normal matrices, and let y be any normal 


path joining them. Then there exists a permutation matrix P such that for 
every wul norm T we have 


T(Eig A — P(Eig B)P~*) < £,(9). (VI.44) 


Proof. For convenience, let y(t) be parametrised on the interval (0, 1]. Let 
(0) = A, y(1) = B. By Theorem VI.1.4, there exist continuous functions 
Ai(t),...,An(t) that represent the eigenvalues of the matrix y(t) for each t. 
Let D(t) be the diagonal matrix with diagonal entries ,;(t). We will show 
that 


r(D(0) — D()) <&-(7). (V1.45) 


Let 7 be any wui norm, and let € be any positive number. Let 4/(s, ¢| 
denote the part of the path +(-) that is defined on |s, t]. Let 


G = {t:7r(D(0) — D(t)) < (1 +6)é,(40, t])}. (VI.46) 


Because of continuity, G is a closed set and hence it includes its supremum 
g. We will show that g = 1. If this is not the case, then we can choose 
g’ > g so close to g that Proposition VI.6.7 guarantees 


t(D(g) — PD(g’)P~*) < (1+ )7(y(9) — ¥(9')), (V1.47) 
for some permutation matrix P. Now note that 


t(D(g) — P™'D(g)P) < r(D(g’) — D(g)) + r(D(g) — PD(g’)P~*), 


178 VI. Spectral Variation of Normal Matrices 


and hence if g’ is sufficiently close to g, we will have t(D(g) — P~'D(g)P) 
small relative to the minimum distance between the distinct eigenvalues of 
D(g). We thus have D(g) = P~*D(g)P. Hence 


7(D(g) — D(g’)) = 1(P~*D(g)P — D(g’)) = 7(D(g) — PD(g')P~"). 
So, from (VI.47), 


r(D(g) — D(g’)) < A +2)t(¥(9) — ¥(9’))- 


From the definition of g as the supremum of the set G in (VI.46), we have 
T(D(0) — D(g)) < (1 + €)é(70, g)). 

Combining the two inequalities above, we get 
7(D(0) — D(g’)) < (1+ €)é-(70, 9'])- 


This contradicts the definition of g. So g = 1. a 


The inequality (VI.45) tells us not only that for all normal A, B and for 
all wui norms 7 we have 


d.-(o(A),o(B)) < £-(), (VI.48) 


but also that a matching of o(A) with o(B) can be chosen which makes 
this work simultaneously for all 7. Further, this matching is the natural 


one obtained by following the curves \,(¢) that describe the eigenvalues of 
the family y(t). 


Several corollaries can be obtained now. 


Theorem VI1.6.9 Let A,B be unitary matrices, and let K be any skew- 
Hermitian matrix such that BA~* = exp K. Then, for every unitarily in- 


variant norm ||| - ||, we have, 

dy.(0(A), o(B)) < IAI. (VI.49) 
Proof. Let y(t) = (exptK)A, 0 < t < 1. Then 7(t) is unitary for all 
t, y(0) = A, (1) = B. So, by Theorem VI.6.8, 


dy.y(o(A),o(B)) < f lly’ Ol 


But 7'(t) = K(exptk)A. So ||ly'(é)|l] = |All = 


Theorem VI.6.10 Let A, B be unitary matrices. Then for every unitarily 
invariant norm 


dyj.4)(0(A),0(B)) < 5 IA Bll. (V1.50) 
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Proof. In view of Theorem VI.6.9, we need to show that 
; _ 1 
inf{||K | : BA? = exp K} < “IJ — Bil 


Choose a K whose eigenvalues are contained in the interval (—iz, in|. By 
applying a unitary conjugation, we may assume that K = diag(iO1,...,i0,). 
Then 


[| 4 — Bil] = |Z - BA" = |I]diag(1 — e+, ...,1 — e')}]. 


But if —7 < 6 < m, then |0| < $|1 — e”|. Hence, |||.K|j| < 5||A — BIl| for 
every unitarily invariant norm. a 


We now give an example to show that the factor 7/2 in the inequality 
(VI.50) cannot be reduced if the inequality is to hold for all unitarily in- 
variant norms and all dimensions. Recall that for the operator norm and 
for the Frobenius norm we have the stronger inequality with 1 instead of 
m/2 (Theorem VI.3.11 and Theorem VI.4.1). 


Example VI.6.11 Let A, and A_ be the unitary matrices obtained by 


adding an entry +1 in the bottom left corner to an upper Jordan matriz, 
1.€., 


0 10 --- 0 
0 O01 -- 0 
Az=| 2. wee, 
0 00. 1 
+100 -- 0 


Then for the trace norm we have ||A, — A_||, = 2. The eigenvalues of A+ 
are the n roots of +1. One can see that the || -||,-optimal matching distance 
between these two n-tuples approaches 7 as n — oo. 


The next theorem is a generalisation of, and can be proved using the 
same idea as, Theorem VI.5.4. 


Theorem VI.6.12 Jf A,B are normal matrices such that A — B 1s also 
normal, then for every wut norm T 


d-(a(A),o(B)) < T(A— B). (VI.51) 


This inequality, or rather just its special case when 7 is restricted to 
unitarily invariant norms and A, B are Hermitian, can be used to get yet 
another proof of Lidskii’s Theorem. We have seen this argument earlier 
in Chapter IV. The stronger result we now have at our disposal gives a 
stronger version of Lidskii’s Theorem. This is shown below. 

Let x,y be elements of C”. We will say that x is majorised by y, in 
symbols x < y, if x is a convex combination of vectors obtained from y by 
permuting its coordinates, i.e., 7 = Layo, a finite sum in which each y, is 
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a vector whose coordinates are obtained by applying the permutation o to 
the coordinates of y and a, are positive numbers with La, = 1. When z,y 
are real vectors, this is already familiar. We will say x is softly majorised 
by y, in symbols x <, y, if we can write x as a finite sum © = Lz,y, in 
which z, are complex numbers such that |z,| < 1. 


n n 
Exercise V1.6.13 Let x,y be two vectors in C” such that 7 and Soy 
j=l j=l 


nr n 
are not zero. If x <, y and Sox; = Soy; then x ~X y. 
j=l j=l 


Proposition VI.6.14 Let A,B ben x n normal matrices and let \(A), 
A(B) be two n-vectors whose coordinates are the eigenvalues of A, B, respec- 
tively. Then T(A) < 7(B) for all wui norms r if and only if \(A) ~, A(B). 


Proof. Suppose 7(A) < 7(B) for all wui norms +. Then, using The- 
orem IV.4.7, we can write the diagonal matrix Eig(A) as a finite sum 
Kig(A) = z,U;,Eig(.B)U;, in which U;, are unitary matrices and ©|z;,| < 1. 
This shows that A(A) = %iz,.S,(A(B)), where each S; is an orthostochas- 
tic matrix. (An orthostochastic matrix S is a doubly stochastic matrix 
such that s,; = |ui;|*, where u;; are the entries of a unitary matrix.) By 
Birkhoff’s Theorem each S; is a convex combination of permutation ma- 


trices. Hence, (A) <; A(B). The converse follows by the same argument 
without recourse to Birkhoff’s Theorem. = 


Theorem VI.6.15 Let A,B be normal matrices such that A — B is also 
normal. Then the eigenvalues of A and B can be arranged in such a way 
that if (A) and A(B) are the n-vectors with these eigenvalues as their 
coordinates, then 


\(A) — \(B) < (A — B). (VI.52) 


Proof. Use Theorem VI.6.8 and the observation in Theorem VI.6.12 to 
conclude that we can arrange the eigenvalues in such a way that 


T(Eig A — Eig B) < r(A — B) 
for every wui norm 7. By Proposition VI.6.14, this is equivalent to saying 
A(A) — \(B) x, A(A — B), 


where \(A) is the vector whose entries are the diagonal entries of the diag- 
onal matrix Eig A. By a small perturbation, if necessary, we may assume 
that trA # trB. Since the components of the vectors \(A) — A(B) and 
\(A — B) must have the same (nonzero) sum, we have in fact majorisation 
rather than just the soft majorisation proved above. a 
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We can call this the Lidskii Theorem for normal matrices. It in- 
cludes the classical Lidskii Theorem as a special case. 


Exercise VI.6.16 Let A,B be normal matrices such that A — B is also 


normal. Let T be any wui norm. Show that there exists a permutation matriz 
P such that 


T(A— B) < r(Eig A — P(Eig B)P~?). (VI.53) 


VI.7 Some Inequalities for the Determinant 


The determinant of the sum A+ B of two matrices has no simple relation 
with the determinants of A and B. Some interesting inequalities can be 
derived using ideas introduced in this chapter. These are proved below. 


Theorem VI.7.1 Let A and B be Hermitian matrices with eigenvalues 
Q11,.-+,Qm and By,..., Bn, respectively. Then 


min ] [ai + Bow) < det(A + B) < max ] [i+ Bow), (VI.54) 


=1 1=1 


where o varies over all permutations. 
Proof. If A and B commute, they can be diagonalised simultaneously, 
and hence det(A + B) = [ [a + 8,(:)) for some a. So, the inequality 


i=1 

(VI.54) is trivial in this case. Next note that the two extreme sides of (VI.54) 

are invariant under the transformation B — UBU* for every unitary U. 

Hence, it suffices to prove that for a fixed Hermitian matrix A the function 

f(H) = det(A + H) on the unitary orbit Ug of another Hermitian matrix 

B attains its minimum and maximum at points that commute with A. 
Let Bo be any extreme point of f on Ug. Then, we must have 


a det(A + e’* Boe **) =0, (VI.55) 
dt | 5 


for every skew-Hermitian Kk. Now, 
det(A + e’* Boe~**) = det(A + By + t[K, Bo]) + O(t*). 
Note that, if X,Y are any two matrices and X is invertible, then 
det(X +tY) = det X(1+ttr YX~')4+ O(t*). 
So, if A + Bo is invertible, the condition (VI.55) reduces to 


tr[K, Bo|(A + Bo)" = 0. 
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This is equivalent to saying 
tr K(Bo(A + Bo)! — (A+ Bo)7' Bo) = 0. 
If this is to be true for all skew-Hermitian kK, we must have 
Bo(A+ Bo)* = (A+ Bo)7*Bo. 


Thus Bp commutes with (A + Bo)~!, hence with A+ Bo, and hence with 
A. 

This proves the theorem under the assumption that A+ Bo is invertible. 
The general case follows from this by a limiting argument. | 


Exercise VI.7.2 Let A and B be Hermitian matrices. If \,(A)+A4(B) > 
0, then 


3 


II \}(A) + A(B)) < det(A + B) < [Ios )+A1(B)). — (VL.56) 


This 1s true, in particular, when A and B are positive matrices. 


Theorem VI.7.3 Let A, B be Hermitian matrices with eigenvalues a; and 
G;, respectively, ordered so that 


la] >> Jan} and |&)>---> |Bpl. 
Let T=A+iB. Then 


| det T| < I] fez + 1Bn—j+1]- (VI.57) 


j=1 


Proof. The function f(t) = 5 logt is concave on the positive half-line. 
Hence, using the majorisation (V1.7) and Corollary II.3.4, we have 


3 log ja; + 7Bn—j41| > 3 log s;. 


j=1 


Hence, 


nm Tr 
] [le + tBn—j41| = [Is = | det T}. 
j=1 j=1 | 


Proposition VI.7.4 Let T = A+B, where A is positive and B Hermi- 
tran. Then 


|det T| = det A ]] [1 + s;(A-1/?BA-¥/?)291/2, (V1.58) 


j=1 
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Proof. Since T = Al/?2(1+iA-/?BA-/2) Al/2 | we have 

det T = det A- det(I +iA~*/2 BAY). (VI.59) 
Note that 

| det(I +i1A—1/? BA~1/2)/? 

det[((I ++iA7/*BA~1/2)\(T —iA7V/?2BA- 1/2) 
det[I + (A7/? BA7'/?)?] 
] [i +s(47?BA-1/?)?). (VI1.60) 
j=l 


So, (VI.58) follows from (VI.59) and (VI.60). a 


| 


| 


Corollary VI.7.5 If the matriz A in the Cartesian decomposition T = 
A+1B is positive, then | det T| > det A. 


Theorem VI.7.6 Let T = A+iB, where A and B are positive matrices 
with eigenvalues a, >--- > Aan and 2, >--- > Br, respectively. Then, 


[det T| > | [la; +2). (VI.61) 


j=l 


Proof. We may assume, without loss of generality, that both A,B are 
positive definite. Because of relations (VI.59) and (VI.60), the theorem will 
be proved if we show 


Tm 


fe ssarnary> Flas a 


j=l j=l 
Note that 
s;(A7¥/? BA-1/2) _ s;(A7 1? BY2)2, 
From (III.20) we have 
{log Sn—j+1(A'/?) + log s;(BY/?)}; < {log sj(A7/?BY?)} 
This is the same as saying 


{log(a; /"8;/")}; < {log s;(A7/7BY?)};, 


Since the function log(1 + e**) is convex in t, using Corollary II.3.4 we 
obtain from the last majorisation 


S-log(l + 05767) < > log (1+s8;(A-/?B"/?)") 


j=l j=l 


S- log(1 + s,(A7V? BA~*/?)?), 


This gives the desired inequality. a 
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Exercise VI.7.7 Show that if only one of the two matrices A and B is 
positive, then the inequality (VI.61) is not necessarily true. 


A very natural generalisation of Theorem VI.7.1 would be the following 
statement. If A and B are normal then det(A + B) lies in the convex hull 


of the products [ [ta + B5(;)). This is called the Marcus-de Oliviera 


i=1 
Conjecture and is a well-known open problem in matrix theory. 


VI.8 Problems 


Problem VI.8.1. Let A be a Hermitian and B a skew-Hermitian matrix. 
Show that 


\Eig'!(A) — Eig! (B)|lo < ||A- Bllo 


for every Q-norm. 


Problem VI.8.2. Let T = 4+%B, where A and B are Hermitian. Show 
that, for 2<p<o, 


Tp < 2'-?/? min||Eig A + Eig, (iB)||p, 


max||Eig A + Eig, (iB)|| < 21/2-T/PIT |, 


and that for 1 < p< 2, 


Tp < 2? -1/? min||Eig A + Eig, (iB) ||, 


max||Eig A + Fig,(tB)|lp < 22/P— TIL. 


Problem VI.8.3. A different proof of Theorem VI.3.11 is outlined below. 
Fill in the details. 

Let n > 3 and let A,B be n x n unitary matrices. Assume that the 
eigenvalues a; and @; are distinct and the distances la; — B;| are also 
distinct. If 71, y2 are two points on the unit circle, we write 7, < 72 if the 
minor arc from 7 to yo goes counterclockwise. We write (ay) if the points 
a, 8,7 on the unit circle are in counterclockwise cyclic order. Number the 
indices modulo n, e.g., Qn41 = Q1. 

Label the eigenvalues of A so as to have the order (1Q2°--Q,). Let 
6 = d(o(A),o(B)). Assume that 6 < 2; otherwise, there is nothing to 
prove. Label the eigenvalues of B as 3),...,G, in sucha way that for any 
subset J of {1,2,...,n} and for any permutation o 


max|a; — B;| < max|a; — Bo(i)|, 
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Then 6 = max la; — @;|. Assume, without loss of generality, that this 
<i<n 


maximum is attained at 2 = 1 and that a, < (,. Check the following. 
(i) If 8; < a, then neither (a1;3,) nor (a1a;8;) is possible. 
(ii) There exists 7 such that |aj;+1 — 8;| > 6. Choose and fix one such j. 
(ii) We have (a1 (1830541). 
(iv) For 1 <i <j we have ((1(;;). 
Let K, be the arc from a;41 positively to a; and Kg the arc from By 


positively to 6;. Then there are n — 7 +1 of the a; in K,4 and j of the G; 
in Kg. Use Proposition VI.3.5 now. 


Problem VI.8.4. Let aj,...,a, and (,,...,@, be any complex numbers. 
Show that there is a number 7y such that 


max| a; —7|+ max|(; —7|< /2 max|oy — B;|. 


(The proof might be long but is not too difficult.) Use this to get another 
proof of Theorem VI.3.14. 


Problem VI.8.5. Let A be a Hermitian and B a normal matrix. If the 
eigenvalues a; of A are enumerated as a, > --- > a, and if the eigenvalues 
@,; of B are enumerated so that Re 6, >--- > Re G, then 

max lay ~ B;| < V2 ||A— BI). 


1<j<n 


Problem VI.8.6. Let A be a normal matrix with eigenvalues a1,...,Qn. 
Let B be any other matrix and let « = ||A — B||. By Theorem V1.3.3, 


all the eigenvalues of B are contained in the set D = UJ D(a,;,€). Use 


j 

the argument in the proof of Theorem VI.5.1 to show that each connected 
component of D contains as many eigenvalues of B as of A. Use this and 
the Matching Theorem (Theorem II.2.1) to show that 


d(a(A), o(B)) < (2n—1)||A— BI. 


[If A and B are both normal, this argument together with the result of Prob- 
lem II.5.10 shows that d(o(A),o(B)) < n||A — B||. However, in this case, 
the Hoffman-Wielandt inequality gives a stronger result: d(o(A), o(B)) < 
Jn ||A—B||. We will see in the next chapter that, in fact, d(o(A),0(B)) < 
3||A — B|| in this case.| 


Problem VI.8.7. Let A be a Hermitian matrix with eigenvalues a, = 
--» > Gp, and let B be any matrix with eigenvalues 6; arranged so that 
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ReG, >--- > ReG,. Let Ref; = pw; and Imf; = v;. Choose an orthonormal 
basis in which B is upper triangular and 


B=M+iN +1R, 


where M = diag(111,...,/mn), N = diag(1,...,v,) and R is strictly upper 
triangular. Show that 


im(A — B)|[3 = |NI2 + 1/2/1 Rilo. 


Hence, ; 
Dlvy|? < |llm(A — B)|3. 


Show that 


(S(aj — ;)?)"/? < |[Re(A — B)lla + <5lRl 


Combine the inequalities above to obtain 
(Slay — B;|?)'/* < V2 ||A - Bile. 


Compare this with the result of Problem VI.8.5; note that there B was 
assumed to be normal. 


Problem VI.8.8. It follows from the result of the above problem that if 
A is Hermitian and B an arbitrary matrix, then 


d(o(A), o(B)) < V2n ||A — BI. 


The factor /2n here can be replaced by another that grows only like log n. 
For this one needs the following fact, which we state without proof. (See 
the discussion in the Notes at the end of the chapter.) 

Let Z be ann x n matrix whose eigenvalues are all real. Then 


IZ —Z2*||<yw|lZ2+ 2" |), 


where 


The constant y, is the smallest one for which the above norm inequality 
is true. Approximating the sum by integrals, it is easy to see that y,/logn 
approaches 2/7 as n— oo. 

Using the notations of Problem 7, show that 


max |v;| < ||Im B|| = ||Im(A — B)]], 


1 * 
max |; — [4| < ||A — M|| = ||Re(A — B)|| + 5 lR— R']. 
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Let Z = N+ R. Then Z has only real eigenvalues, Z — Z* = R— R*, and 
Z+ Z* =2 Im(A — B). Hence, 


max la; — B;| < ||Re(A — B)|| + (qn + 1)|[Im(A — B)]]. 
This shows that 


d(o(A), o(B)) < (¥n + 2)\|A— BI. 


Problem VI.8.9. Let A be the Hermitian matrix with entries 
aij = woz if t# J, 
ai = O for all 7. 
Let B= A+C where C is the skew-Hermitian matrix with entries 


Cc, = O for all 7. 


Then B is strictly lower triangular, hence all its eigenvalues are 0. 

Show that ||A — B|| < x for all n, and ||A|| = O(log n). (This needs some 
work.) Since A is Hermitian, this means that its spectral radius is O(log n). 
Thus, in this example, d(a(A), o(B)) = O(log n) and ||A — Bll < 1. So, 
the bound obtained in Problem 8 is not too loose. 


Problem VI.8.10. For any matrix A, let Ap denote its diagonal part and 
Ay, Ay its parts below and above the diagonal. Thus A = Ay + Ap + Ap. 
Show that if A is an n x n normal matrix, then 


Arla < vn-—1lAull2, l|Aulle < Vn — 1 ||Arlle. 
The example in VI.6.11 shows that this inequality is sharp. 


Problem VI.8.11. Let A be a normal and B an arbitrary n x n matrix. 
Choose an orthonormal basis in which B is upper triangular. In this basis 


write A= A, +Ap+Ay, B= Bp+ Bu. By the Hoffman-Wielandt 
Theorem 


d2(a(A),a(B)) < ||A— Bolle = ||A- B+ Bulle. 
Note that 
A-B+By =(A-B),+(A-B)p+Av. 
Use the result of Problem VI.8.10 to show that 
|A- B+ Bulle < Vn ||A— Bile. 


Hence, we have, for A normal and B arbitrary, 


do(o(A),0(B)) < Vn ||A— Blo. 
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From this, we get for the Schatten p-norms 


dp(o(A),o(B)) < n¥P||A-Bllp, 1<p<2 
dp(o(A),o(B)) < ni-/P||A-Bllp, 2<p<oo. 


Show that for 1 < p < 2 these inequalities are sharp. (See Example VI.6.11.) 
For p = oo, this gives 


d(o(A), o(B)) < n||A— Bl, 


which is an improvement on the result of Problem 6 above. 
If A is Hermitian, then ||Ay||2 = ||Az||2. Using this one obtains a slightly 
different proof of the last inequality in Problem VI.8.7. 


Problem VI.8.12. Let A be an n x n Hermitian matrix partitioned as 
A= es Re ), where M is a k x k matrix. Let the eigenvalues of A be 
Ay > ++: > An, those of M be py, > --- > px, and let the singular values of 
R be p, = po >--- . Show that there exist indices 1 <i, <--- < ib<n 
such that for every symmetric gauge function ® we have 


O( 4 _ Nis»: -->Uk — Nix ) < (01, 01, P2, P2;-- .). 


In other words, for every unitarily invariant norm we have 


Idiag(u1 — Ai,,---, Ha — As )II| < RO RI. 


In particular, we have 


|diag( p14 ~~ Nix 2++y Lk — ri, ) lI < || F| 


and 
|diag(u1 _ Aixs- -+y Hk — Xi, )ll2 < v2 || Ril2. 


Use an argument similar to the one in the proof of Theorem VI.4.1 to 
show that the factor V2 in the last inequality can be replaced by 1. This 
raises the question whether we have 


IIdiag(u1 — Ai,,---, We — Aa, dll < [ILRI 


for all unitarily invariant norms. This is not SO, aS can be seen from the 
example 


0 1 ¥3 
A= 1 O O 
V3 0 O 


Problem VI.8.13. Let ® be a closed subset, of C, and let F be a retraction 
onto ®. Let N(®) be the set of all normal matrices whose spectrum is 
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contained in ®. Show that if ® is a convex set, then for every unitarily 
invariant norm 


|B — F(B)Il| < ||B - All, 


whenever B is a normal matrix and A € N(®). If ® is any closed set, then 
the above inequality is true for all Q-norms. 


Problem VI.8.14. The aim of this problem and the next, is to outline 
an alternative approach to the normal path inequality (VI.44). This uses 
slightly more sophisticated notions of differential geometry. 


Let A be any n xn matrix, and let O, be the orbit of A under the action 
of the group GL(n), i-e., 


On = {gAg™*: 9 € GL(n)}. 


This is called the similarity orbit of A; it is the set of all matrices 
similar to A. Every differentiable curve in O, passing through A can be 
parametrised locally as e’* Ae~**, X € M(n). By the same argument as in 


Section VI.4, the tangent space to O, at the point A can be characterised 
as 


TsO, = {[A,X]: X € M(n)}. 


The orthogonal complement of this space in M(n) can be calculated as in 
Lemma VI.4.2. Show that 


(TaOa)* = Z(A*). 


Now, a matrix A is normal if and only if Z(A*) = Z(A). So, for a normal 
matrix we have a direct sum decomposition 


M(n) = T4O,4 @ ZA). 
Now, if B € Og, then B and A have the same set of eigenvalues and hence 
dz(a(A), o(B)) = 0. 


If B € Z(A), then there is an orthonormal basis in which A and B are both 
upper triangular. Hence, for such a B, 


dz(a(A), o(B)) < ||A — Bll. 


Now, let y(t),0 <t< 1 bea C! curve in the space of normal matrices. 
Let (0) = Ag, y(1) = Ay. Let p(A) = de(o(Ao), 0(A)). At each point 7(¢) 
consider the decomposition 


M(n) = Ty4)O4(t) 8 (Y(t) 


obtained above. Then, as we move along y(t), the rate of change of the 
function » is zero in the first direction in this decomposition, and in the 
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second, it is bounded by the rate of change of the argument. Hence we 
should have 


y(r(1)) < / ly’ (t) leat. 


Prove this. Note that this says that 


d2(a(Ao), o(A1)) < £2(7), 
if y is a C' curve passing through normal matrices and joining Ag to A}. 


Problem VI.8.15. The two crucial properties of the Frobenius norm used 
above were its invariance under the conjugations A — UAU* and the 
pinching inequality. The first made it possible to change to any orthonormal 
basis, and the second was used to conclude that the diagonal of a matrix 
has norm smaller than the whole matrix. Both these properties are enjoyed 
by all wui norms. So, the method outlined above can be adopted to work 
for all wui norms to give the same result. (Some conditions on the path are 
necessary to ensure differentiability of the functions involved.) 


Problem VI.8.16. Fill in the details in the following outline of a proof of 
the statement: every complex matrix with trace 0 is a commutator of two 
matrices. 

Let A be a matrix such that trA = 0. Assume that A is upper triangular. 
Let B be the nilpotent upper Jordan matrix (i.e., B has all entries 0 except 
the ones on the first superdiagonal, which are all 1). Then Z(B*) contains 
only polynomials in B*. (This is a general fact: Z(X ) contains only polyno- 
mials in X if and only if in the Jordan form of X there is just one block for 
each different eigenvalue.) Thus Z(B*) consists of lower triangular matrices 
with constant diagonals. Show that A is orthogonal to all such matrices. 
Hence A is in the space TgOp, and so A = [B,C] for some C. 
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Theorem VI.3.3 is one of the several results proved in the paper Norms 
and exclusion theorems, Numer. Math. 2(1960) 137-141, by F.L. Bauer and 
C.T. Fike. Theorem VI.3.11 was first proved by R. Bhatia and 
C. Davis, A bound for the spectral variation of a unitary operator, Lin- 
ear and Multilinear Algebra, 15 (1984) 71-76. Their proof is summarised in 
Problem VI.8.3. This approach of ordering eigenvalues in the cyclic order is 
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a well-known theorem; the proof we have outlined here was shown to us by 
V.S. Sunder. 


Vil 


Perturbation of Spectral Subspaces 
of Normal Matrices 


In Chapter 6 we saw that the eigenvalues of a (normal) matrix change 
continuously with the matrix. The behaviour of eigenvectors is more com- 
plicated. The following simple example is instructive. Let A = (74° e) B 
H, B= (: i) ® H, where H is Hermitian. The eigenvalues of the first 2 x 2 
block of A are 1+¢,1—e. The same is true for B. The corresponding nor- 
malised eigenvectors are (1,0) and (0,1) for A, and (1, 1) and aa (1, —1) 
for B. As e ~ 0,B and A approach each other, but their eigenvectors re- 
main stubbornly apart. Note, however, that the eigenspaces that these two 
eigenvectors of A and B span are identical. In this chapter we will see that 
interesting and useful perturbation bounds may be obtained for eigenspaces 
corresponding to closely bunched eigenvalues of normal matrices. 

Before we do this, it is necessary to introduce notions of distance between 
two subspaces. Also, it turns out that this perturbation problem is closely 
related to the solution of the matrix equation AX—XB=Y. This equation 
called the Sylvester Equation, arises in several other contexts. So, we 


will study it in some detail before applying the results to the perturbation 
problem at hand. 


VII.1 Pairs of Subspaces 195 


VIL.1 Pairs of Subspaces 


We will be dealing with block decompositions of matrices. To keep track of 
dimensions, we will find it convenient to write 


k id 
A= (42 Ajo ) m 
Ao, Ag} p 
for a block-matrix in which k and @ are the number of columns, and m and 
p are the number of rows in the blocks indicated. 


Theorem VII.1.1 (The QR Decomposition) Let A be an m x n matriz, 
m2n. Then there is anm xm unitary matriz Q such that 


R n 
“A= ; VIL.1 
Q Cone (VIL.1) 
where R is upper triangular with nonnegative real diagonal entries. 


Proof. For a square matrix A, this was proved in Chapter 1. The same 


proof also works here. (In essence this is just the Gram-Schmidt pro- 
cess. ) = 


The matrix R above is called the R factor of A. 


Exercise VII.1.2 Let A be anim x n matriz with rank A =n. Then the 
R factor in the QR decomposition of A has positive diagonal elements and 
is uniquely determined. (See Exercise I.2.2.) If we write 


n m-n 
Q = (Qi (2 )m, 


then we have 


A=QiR, Q:=AR™. 


Thus Q, is uniquely determined by A. However, Q2 need not be unique. 
Note the range of A is the range of Q, and its orthogonal complement is 
the range of Qo. 


Exercise VII.1.3 Let A be anmxn matriz with m <n. Then there exists 
ann xn unitary matriz W such that 


m mrm%— mM 


AW= (LO )m, 


where L is lower triangular and has nonnegative real diagonal entries. 
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We remark here that it is only for convenience that we choose R (and 
L) to have nonnegative diagonal entries. By modifying Q (and W), we can 
also make R (and L) have nonpositive real diagonal entries. 


Exercise VII.1.4 Let A be anm xn matrix with rank A = r. Then there 
exists ann xn permutation matriz P and anmxm unitary matriz Q such 


that 
* —f Ru Rie 


where Ry, 1s anr xr upper triangular matriz with positive diagonal entries. 
This 1s called a rank revealing QR decomposition. 


Exercise VII.1.5 Let A be anmxn matriz with rank A = r. Then there 
exists anm xm unitary matriz Q and ann xn unitary matrix W such 


that 
* _{ T 0 
gaw = (4 5): 


where T is anr x r triangular matrix with positive diagonal entries. 


Theorem VII.1.6 (The CS Decomposition) Let W be ann x n unitary 
matrix partitioned as 


id m 
Wir Wie £ 
W = II.2 
ts We ) m (VIL.2) 


where € < m. Then there exist unitary matrices U = diag(U1,, U22) and 
V = diag(Vi1, Voz), where U1, Vi, are £ x & matrices, such that 


£L £ m-# 
C —-S 0 £ 

U*WV = S C 0 i (VII.3) 
0 O I m—#é 


where C and S are nonnegative diagonal matrices, with diagonal entries 
Osc, S-+-<eg<land1>s5,>--->s, = 0, respectively, and 


C24 $2 =I. 


Proof. For the sake of brevity, let us call a map X — U*XV on the 
space of n x n matrices a U-transform, if U,V are block diagonal unitary 
matrices with top left blocks of size @ x £. The product of two U-transforms 
is again a U-transform. We will prove the theorem by showing that one 
can change the matrix W in (VII.2) to the form (VII.3) by a succession of 
U-transforms. | 
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Let U11, Vii be @ x @ unitary matrices such that 


kK &f—k 
Ut WiiVi1 = ( 0 I ), 
where C\ is a diagonal matrix with diagonal entries 0 < c, <cg<ee< 


cy < 1. This is just the singular value decomposition: since W is unitary, all 
singular values of W,, are bounded by 1. Then the U-transform in which 
U = diag (Uji,/) and V = diag (Vi;,/) reduces W to the form 


(VII.4) 


where the structures of the three blocks whose entries are indicated by ? 
are yet to be determined. Let W2, denote now the bottom left corner of the 
matrix (VII.4). By the QR Decomposition Theorem we can find anm xm 
unitary matrix Q22 such that 


e 
Qo.Wai = (3) f (VIL.5) 


m—€? 

where F is upper triangular with nonnegative diagonal entries. The U/- 
transform in which U = diag (J,Q22) and V = diag (I,J) leaves the top 
left corner of (VII.4) unchanged and changes the bottom left corner to the 
form (VII.5). Assume that this transform has been carried out. Using the 
fact that the columns of a unitary matrix are orthonormal, one sees that 
the last €— k columns of R must be zero. Now examine the remaining 
columns, proceeding from left to right. Since C) is diagonal with nonnega- 
tive diagonal entries, all of which are strictly smaller than 1, one sees that 
R is also diagonal. Thus the matrix W is now reduced to the form 


k &€—k m 
k C1 0 | 
L—k 0 I | ? 
1 Ss, i” . 2 (VII.6) 
l—k 0 0 | ? 
m—-£\ 0 0 | ? 


in which S$) is diagonal with C? + S? = I, and hence 0 < S; < J. The 
structures of the two blocks on the right are yet to be determined. Now, by 
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Exercise VII.1.3 and the remark following it, we can find an m x m unitary 
matrix V2 which on right multiplication converts the top right block of 
(VII.6) to the form 


e(L 0 ), (VII.7) 


in which L is lower triangular with nonpositive diagonal entries. The U/- 
transform in which U = diag (J,/J) and V = diag (J, V22) leaves the two 
left blocks in (VII.6) unchanged and converts the top right block to the 
form (VII.7). Again, orthonormality of the rows of a unitary matrix and 
a repetition of the argument in the preceding step show that after this 
U-transform the matrix W is reduced to the form 


k t—k k L—-k m-8 
k C7 0 | — 9} 0 0 
e-k}/o r | 0 0 0 
k Sy 0 | X33 X34 X35 |° (VIL8) 
€—k | 0 0 | X43 Xaq  Xas5 
m—-£\ 0 0 | X53 X54 X55 


Now, we determine the form of the bottom right corner. Since the rows of 
a unitary matrix are mutually orthogonal, we must have C,S, = S$, X33. 
But C, and S; are diagonal and Sj; is invertible. Hence, we must have 
X33 = C;. But then the blocks X34, X35, X43, X53 must all be 0, since the 
matrix (VII.8) is unitary. So, this matrix has the form 


k ¢—k kt l-~k m—-? 
k C7; 0 | —S} 0 0 
&—k 0 I | 0 0 0 
f—k 0 0 | 0 X44 X45 
m—E£\ 0 0 | 0 X54 X55 


Xa, X 
Let X = ( “4 * |. Then X is a unitary matrix of size m — k. Let 
X54 X55 


U = diag (Ie, Ix, X), where Ip and I; are the identity operators of sizes @ 
and k, respectively. Then, multiplying (VII.9) on the left by U*— another 
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U-transform — we reduce it to the form 


k l-k k £—-k m—€ 
k C7} 0 | -S; oO | 0 
l—k 0 I | 0 0 | 0 
—— — —~— —_— | —_— 
k Sy 0 | Cy 0 | 0 , (VII.10) 
l—k 0 0 | 0 I | 
a —_—— —— a | —_— 
m-€\0 0 | O OO | fr 
If we now put 
k ¢—k 
{Cy 0 k 
C= (6 I ) oe, 
and 
k L—k 
— (Sy, 0 k 
S= ( 0 0 ) l—k”’ 
then the matrix (VII.10) is in the desired form (VII.3). = 


Exercise VII.1.7 Let W be as in (VII.2) but with 2 > m. Show that there 
exist unitary matrices U= diag (Ui1,U22) and V = diag (Vi1, Vaz), where 
O11, Vii are £ x £ matrices, such that 


C 0 —§ n—# 
U*WV = 0 I 0 2l—n , (VII.11) 
S 0 C n— 


where C and S are nonnegative diagonal matrices with diagonal entries 
O< ce <---<cnp_e < l and1> 5s, >-:: > Sn_¢ > 0, respectively, and 
C*4+ S27 =T. 


The form of the matrices C' and S in the above decompositions suggests 
an obvious interpretation in terms of angles. There exist (acute) angles 
0;, 5 2 8, = 62 > --- 2 0, such that c; = cos 4; and s; = sin6;. 

One of the major applications of the CS decomposition is the facility it 
provides for analysing the relative position of two subspaces of C”. 


Theorem VII.1.8 Let X,,Y; be n x € matrices with orthonormal 
columns. Then there exist € x € unitary matrices U; and V; and ann xn 
unitary matriz Q with the following properties. 
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(i) If 2<n, then 


£ 
I £ 

QX1U0; = 0 é, (VII.12) 
0/ n—-2é 
£ 
C £ 

QY,V; = S fC, (VIT.13) 
0 n — 20 


where C',S are diagonal matrices with diagonal entries0 < cy <---< a < 
land1>s,>-:->s8g>0, respectively, and C24 S$? =I. 
(it) If 20> n, then 


n—l 2-—n 
I 0 n—é 
QX\U; = 0 I 2l—n , (VII.14) 
0 0 n—£ 
n—-f 2-—n 
C 0 n—k 
S 0 n—é 
where C,S are diagonal matrices with diagonal entries 0 < c, < --- < 


Cn-e Sl andl >s, >---> s8n_¢ > 0, respectively, and C? + S$? = I. 


Proof. Let 2¢ < n. Choose n x (n — @) matrices X> and Y> such that 
X =(X, Xe) and Y =(Y, Yo) are unitary. Let 


id n—£ 
_yey (XM Xm) ¢ 
W= AY = iy ay | n—° 
By Theorem VII.1.6 we can find block diagonal unitary matrices U = 
diag(U;, Uz) and V = diag(V,, V2), in which U, and V; are £ x £ unitaries, 
such that 


USX#Y\V, UZ X*YoVo 0 OT 


Let Q = (XU)* = (X1U; X2U2)*. Then from the first columns of the two 
sides of the above equation we obtain the equation (VII.13). For this Q the 
equation (VII.12) is also true. 


VII.1 Pairs of Subspaces 201 


When 2¢ > n, the assertion of the theorem follows, in the same manner, 
from the decomposition (VII.11). a 


This theorem can be interpreted as follows. Let € and F be ¢-dimensional 
subspaces of C”. Choose orthonormal bases 2},... ,te and yi,...,y¢ for 
these spaces. Let X1 = (1 rq +++ x), Yi = (yi Yo --- Ye). Premultiply- 
ing X1, Y; by Q corresponds to a unitary transformation of the whole space 
C”, while postmultiplying X, by U, and Y, by Vi corresponds to a change 
of bases within the spaces € and F, respectively. Thus, the theorem says 
that, if 2@ < n, then there exists a unitary transformation Q of C” such 
that the columns of the matrices on the right-hand sides in (VIT.12) and 
(VII.13) form orthonormal bases for QE and QF, respectively. The span 
of those columns in the second matrix, for which s; = 1, is the orthogonal 
complement of QE in QF. When 2 > n, the columns of the matrices on 
the right-hand sides of (VII.14) and (VII.15) form orthonormal bases for 
QE and QF, respectively. The last 2—n columns are orthonormal vectors 
in the intersection of these two spaces. The space spanned by those columns 
of the second matrix, in which s; = 1, is the orthogonal complement of QE 
in OF. 

The reader might find it helpful to see what the above theorem says when 
€ and F are lines or planes in R°. 

Using the notation above, we set 


O(E, F) = arcsin S. 


This is called the angle operator between the subspaces € and F. It is a 
diagonal matrix, and its diagonal entries are called the canonical angles 
between the subspaces € and F. 

If the columns of a matrix X are orthonormal and span the subspace €, 
then the orthogonal projection onto € is given by the matrix E = X X*. 
This fact is used repeatedly below. 


Exercise VII.1.9 Let € and F be subspaces of C”. Let X and Y be ma- 
trices with orthonormal columns that span E and F, respectively. Let E, F 
be the orthogonal projections with ranges E,F. Then the nonzero singular 
values of EF are the same as the nonzero singular values of X*Y. 


Exercise VII.1.10 Let €,F be subspaces of C” of the same dimension, 
and let EF, F be the orthogonal projections with ranges E€,F. Then the sin- 
gular values of EF are the cosines of the canonical angles between E and 
F, and the nonzero singular values of E+F are the sines of the nonzero 
canonical angles between E and F. 


Exercise VII.1.11 Let €,F and E,F be as above. Then the nonzero sin- 
gular values of E—F are the nonzero singular values of E~ F,, each counted 
twice; 1.e., these are the numbers 81, 81, $2,82,.-.- 


202 VII. Perturbation of Spectral Subspaces of Normal Matrices 


Note that by Exercise VII.1.10, the angle operator O(€,F) does not 
depend on the choice of any particular bases in € and F. Further, O(€, F) = 
O(F,€). 

It is natural to define the distance between the spaces € and F as 
|| EZ — F'|. In view of Exercise VII.1.11, this is also the number ||E* F'||. More 
generally, we might consider |||E+F'|||, for every unitarily invariant norm, 
to define a distance between the spaces € and Ff. In this case, |||& — F'|| = 
|E"F @ EF* ||. 

We could use the numbers |||E-F'|| to measure the separation of € and 
F, even when they have different dimensions. Even the principal angles can 
be defined in this case: 


Exercise VII.1.12 Let x,y be any two vectors in C”. The angle between 
xz and y is defined to be a number Z(x,y) in [0,7/2] such that 


—1 ly* | 


ell Iyll 


Let E and F be subspaces of C”, and let dim € > dim F = m. Define 
01,-.-,Om recursively as 


Z(x,y) = cos 


0, = max min Z(x,y) = Z(Lk, Yr). 


rEE 
xtl[x, pete Lp) yl{yy peeey Ye—1) 


Then 5 >= 0 >-:- > Om > 0. The numbers 0, are called the principal 
angles between E and F. Show that when dim € = dim Ff, this coincides 
with the earlier definition of principal angles. 


Exercise VII.1.13 Show that for any two orthogonal projections E, F we 
have ||E — F|| <1. 


Proposition VII.1.14 Let E,F be two orthogonal projections such that 
|Z — F|| <1. Then the ranges of E and F have the same dimensions. 


Proof. Let €,F be the ranges of EF and F. Suppose dim € > dim F. 
We will show that €M F~ contains a nonzero vector. This will show that 
|E-FI|=1. 

Let G = EF. Then G C €, and dim G < dim F < dim €. Hence € NG+ 
contains a nonzero vector zx. It is easy to see that EN G+ c Ft. Hence, 
te Fr. a 


In most situations in perturbation theory we will be interested in compar- 
ing two projections E, F such that ||E— F'|| is small. The above proposition 
shows that in this case dim E = dim F. 
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Example VII.1.15 Let 


1 0 1 0O 
1 1 0 0 O 
xX,=— — 
1~Fwlou1il 0 -1 
0 1 0 0 
The columns of X, and of Y,; are orthonormal vectors. If we choose unitary 
matrices 
1 1 1 1 1 1 
U0; = — — —_ 
Ha} An 4 1). 
and 
1 1 1 1 
_1fi1 1 -1 —-1 
Q= 5]. 1 = 1 -1 I> 


then we see that 


1 O 1 0 

0 1 i 0 1 
xX = — 

0 0 0 1 


Thus in the space R* (or C*),the canonical angles between the 2-dimensional 


subspaces spanned by the columns of X, and Y,, respectively, are aoa 


VIl.2 The Equation AX —-XB=Y 


We study in some detail the Sylvester equation, 
AX —-XB=Y. (VIT.16) 


Here A is an operator on a Hilbert space H, B is an operator on a Hilbert 
space K, and X,Y are operators from K into H. Most of the time we are 
interested in the situation when K = H = C”, and we will state and prove 
our results for this special case. The extension to the more general situation 
is straightforward. 

We are given A and B, and we ask the following questions about the 
above equation. When is there a unique solution X for every Y? What is 
the form of the solution? Can we estimate ||X|| in terms of ||Y||? 


Theorem VII.2.1 Let A,B be operators with spectra o(A) and o(B), re- 
spectively. If o(A) and o(B) are disjoint, then the equation (VII.16) has a 
unique solution X for every Y. 
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Proof. Let JT be the linear operator on the space of operators, defined by 
T(X) = AX — XB. The conclusion of the theorem can be rephrased as: 7 
is invertible if o( A) and o(B) are disjoint. 

Let A(X) = AX and B(X) = XB. Then T = A-B and A and B 
commute (regardless of whether A and B do). Hence, o(T) C o( A) —o(B). 
If x is an eigenvector of A with eigenvalue a, then the matrix X, one of 
whose columns is z and the rest of whose columns are zero, is an eigenvector 
of A with eigenvalue a. Thus the eigenvalues of A are just the eigenvalues 
of A, each counted n times as often. So o(.A) = o(A). In the same way, 
o(B) = o(B). Hence o(T) Cc o(A)—o(B). So, if o(A) and o(B) are disjoint, 
then 0 ¢ o(T). Thus, T is invertible. | 


It is instructive to note that the scalar equation ax — xb = y has a unique 
solution x for every y if a—b #0. The condition 0 ¢ o(A) — o(B) can be 
interpreted to be a generalisation of this to the matrix case. This analogy 
will be helpful in the discussion that follows. 

Consider the scalar equation ax — xb = y. Exclude the trivial cases in 
which a = b and in which either a or b is zero. The solution to this equation 


can be written as , 
y\- 
c=a' (1 — -) y. 
a 


If |b| < |a|, the middle factor on the right can be expanded as a convergent 
power series, and we can write 


r=a' > (2) y = > a~™~*yb”. 
n= n=0 


This is surely a complicated way of writing x = y/(a — b). However, it 
suggests, in the operator case, the form of the solution given in the theorem 
below. For the proof of the theorem we will need the spectral radius 
formula. This says that the spectral radius of any operator A is given by 
the formula 


spr(A) = lim ||A™|[1/™. 


Theorem VII.2.2 Let A,B be operators such that o(B) Cc {z: |z| < p} 


and o(A) C {z:|z| > p} for some p > 0. Then the solution of the equation 
AX —-XB=Y is 


X=S° AT 1YB™ (VII.17) 


n=0 


Proof. We will prove that the series converges. It is then easy to see that 
X so defined is a solution of the equation. 

Choose p; < p < p2 such that o(B) is contained in the disk {z : lz] < py} 
and o(A) is outside the disk {z : |z| < p2}. Then o(A™?) is inside the disk 
{z : |z| < py*}. By the spectral radius formula, there exists a positive 
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integer N such that for n > N, ||B”|| < pt and ||A~"|| < py”. Hence, for 
n2N, \|A-"~*YB"|| < (p1/p2)"||A-1Y||. Thus the series in (VII.17) is 


convergent. | 

Another solution to (VII.16) is obtained from the following considera- 

tions. If Re(b — a) < 0, the integral f e“(°-)dt is convergent and has the 
0 


value rat Thus, in this case, the solution of the equation ax — xb = y can 


CO 
be expressed as x = [ e*(°-@)y dt. This is the motivation for the following 
0 


theorem. 


Theorem VII.2.3 Let A and B be operators whose spectra are contained 


in the open right half-plane and the open left half-plane, respectively. Then 
the solution of the equation AX — XB =Y can be expressed as 


X= / e AYyel® dt. (VII.18) 
0 


Proof. It is easy to see that the hypotheses ensure that the integral given 
above is convergent. If X is the operator defined by this integral, then 


OO 


AX-XB = [(Ae4ve” —e 4Ve'® Bydt 
0 
= —e Ay tb lo = y. 
So X is indeed the solution of the equation. a 


Notice that in both the theorems above we made a special assumption 
about the way o(A) and o(B) are separated. No such assumption is made 
in the theorem below. Once again, it is helpful to consider the scalar case 
first. Note that 


1 7 ( 1 1 ) 1 
(a—¢)(b-¢) \a-¢ b-C}/b-a 
So, if I is any closed contour in the complex plane with winding numbers 
1 around a and 0 around b, then by Cauchy’s integral formula we have 


1 dc = 27 
| wo0-3 — a—b 


Thus the solution of the equation ax — xb = y can be expressed as 


~~! ff YY gs 
Dri / (a—-O(b-0 * 
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The appropriate generalisation for operators is the following. 


Theorem VII.2.4 Let A and B be operators whose spectra are disjoint 
from each other. Let T be any closed contour in the complex plane with 
winding numbers 1 around o(A) and 0 around o(B). Then the solution of 
the equation AX — XB=Y can be expressed as 

Xa [(A-O7Y(B-6-MaC. (VII.19) 


271 


Proof. If AX — XB = Y, then for every complex number (¢, 
(A-—O)X —X(B-—¢)=Y.If A—¢ and B—¢ are invertible, this gives 


X(B-)* -(A-Q)°X = (A-C)UY(B-¢)™. 
Integrate both sides over the given contour I and note that [(B—¢)~'d¢ = 
r 


0 and — [(A—¢)~‘d¢ = 2nil. This proves the theorem. 7 
r 


Our principal interest is in the case when A and B in the equation 
(VII.16) are both normal or, even more specially, Hermitian or unitary. In 
these cases more special forms of the solution can be obtained. 

Let A and B be both Hermitian. Then 7A and iB are skew-Hermitian, 
and hence their spectra lie on the imaginary line. This is just the opposite of 
the situation that Theorem VII.2.3 was addressed to. If we were to imitate 


that solution, we would try out the integral [ e~"4Ye? dt. This, however, 


0 
does not converge. This can be remedied by inserting a convergence factor: 
a function f in L'(R). If we set 


CO 
X= / e tA e® F(t) dt, 
—oo 
then this is a well-defined operator for each f € L1(R), since for each t the 
exponentials occurring above are unitary operators. Of course, such an X 
need not be a solution of the equation (VII.16). Can a special choice of f 


make it so? Once again, it is instructive to first examine the scalar case. In 
this case, the above expression reduces to 


c= yf(a _ b), 
where f is the Fourier transform of f, defined as 


CO 


f(s) = / e~"® f(t)dt. (VII.20) 


— CO 


VII.2 The Equation AX —-XB=Y 207 


So, if we choose an f such that f(a — b) = —4,, we do have ax — xb = y. 
The following theorem generalises this to operators. 


Theorem VII.2.5 Let A, B be Hermitian operators whose spectra are dis- 
joint from each other. Let f be any function in L1(R) such that f(s) = 


whenever s € o(A)—o(B). Then the solution of the equation AX—-XB = ; 
can be expressed as 


X= / e tA ettB ttyadt. (VII.21) 


Proof. Let a and @ be eigenvalues of A and B with eigenvectors u and 
v, respectively. Then, using the fact that e*4 is unitary and its adjoint is 
eit, we see that 


(u, Ae~*AY eitB y) = (tA dy YeitBy) 
= e@P-Ho(y Yu). 
A similar consideration shows that 
(u,e#AY eB By) = ei 8-2) Bly Vy), 
Hence, if X is given by (VII.21), we have 


(u,(AX—XB)v) = (a—B)(u,Yv) / eit(B—a) £4) at 
(a — 8)(u, ¥v) f(a — 8) 
(u, Yv). 


Since eigenvectors of A and B both span the whole space, this shows that 
AX -—-XB=Y. | 


The two theorems below can be proved using the same argument as 
above. For a function f in L'(R?) we will use the notation f for its Fourier 
transform, defined as 


CO 600 
fl (s1, S2) -|/ / ett 81 +4282) F(t, to) dt, dte. 
—OO —00 


Theorem VII.2.6 Let A and B be normal operators whose spectra are 
disjoint from each other. Let A= A, +1Ao, B = By, +1Bo, where A, and 
Ag are commuting Hermitian operators and so are B, and Bo. Let f be 


any function in L1(R*) such that f(si,82) = sis whenever 8; + 182 € 
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o(A) — o(B). Then the solution of the equation AX — XB =Y can be 
expressed as 

X= / / et Ai tte Az) y et(ti Bitte Ba) F(t, to)dtydta. (VII.22) 
Theorem VII.2.7 Let A and B be unitary operators whose spectra are 
disjoint from each other. Let {an}, be any sequence in £, such that 


S> ane”? = — whenever e*” € (a(A))~* - o(B). 
— et 


n=— CO 


Then the solution of the equation AX — XB=Y can be expressed as 


CO 


X= S> apA-™'YB". (VII.23) 


The different formulae obtained above lead to estimates for |||_X||| when 
A and B are normal. These estimates involve |||Y||| and the separation 6 
between o(A) and o(B), where 

6 = dist(o(A),o(B)) = min{|A — p| : A € o(A), w € o(B)}. 
The special case of the Frobenius norm is the simplest. 


Theorem VII.2.8 Let A and B be normal matrices, and let 6 


= dist(a(A), o(B)) > 0. Then the solution X of the equation AX —-XB= 
Y satisfies the inequality 


1 
|X]l2 < S||¥ le. (VIT.24) 


Proof. If A and B are both diagonal with diagonal entries \j,...,An 
and [1,..., fm, respectively, then the entries of X and Y are related by the 
equation 24; = yij/(Ai — ;). From this (VII.24) follows immediately. 

If A,B are any normal matrices, we can find unitary matrices U,V and 
diagonal matrices A’, B’ such that A = UA'U* and B = VB’'V*. The 
equation AX — XB =Y can be rewritten as 


U A'U* X — XVB'V* =Y 
and then as 
A'(U* XV) — (U*XV)B' =U*YV. 
So, we now have the same type of equation but with diagonal A’, B’. Hence, 


* ] ok 
JU" XV jo < 5llU YV le. 


By the unitary invariance of the Frobenius norm this is the same as the 
inequality (VII.24). = 


VII.2 The Equation AX —-XB=Y 209 


Example VII.2.9 If A or B is not normal, no imequality like (VII.24) 
2s true in general. For example, if A= Y =I and B = (5 os then the 
equation AX — XB =Y has the solution X = ( i). Here 6 = 1, ||Y|lo = 


V2, but |X |l2 can be made arbitrarily large by choosing t large. Thus we 


cannot even have a bound like ||X|l2 < €||Y|l2 for any constant c in this 
case. 


Example VII.2.10 In this example all matrices involved are Hermitian: 


(3 0 _{ -3 0 
a=(9 4) a= (0 t) 


«= (re ‘s)) Y= (as M6). 


Then AX —- XB =Y. Here 6 = 2. But, for the operator norm, \|X || > 
sll ||. Thus, the inequality ||X|| < #\|Y || need not hold even for Hermitian 


? 


In the next theorems we will see that we do have |||X'||| < SAY ||| for a 
small constant c when A and B are normal. When the spectra of A and B 
are separated in a special way, we can choose c = 1. 


Theorem VII.2.11 Let A and B be normal operators such that the spec- 
trum of B is contained in a disk D(a, p) and the spectrum of A lies outside 


a concentric disk D(a, p+6). Then, the solution of the equation AX —XB = 
Y satisfies the inequality 


XII < IY (VI.25) 


for every unitarily invariant norm. 


Proof. Applying a translation, we can assume that a = 0. Then the 
solution X can be expressed as the infinite series (VII.17). From this we 
get 


Xl 


IA 


dA IY BI" 


n=0 


WY SO (9 +.6)-7? p” 


n=0 


IA 


1 
= =|/Y I. 
- IY : 


Either by taking a limit  — oo in the above argument or by using the 
form of the solution (VII.18), we can prove the following. 
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Theorem VII.2.12 Let A and B be normal operators with o(A) and o(B) 
lying in half-planes separated by a strip of width 6. Then the solution of the 
equation AX — XB=Y satisfies the inequality (VII.25). 


Exercise VII.2.13 Here is an alternate proof of Theorem VII.2.11. As- 
sume, without loss of generality, that a = 0. Then A is invertible. Write 
X =A "(Y + XB) and obtain the inequality (VII.25) directly from this. 


Exercise VII.2.14 Choose unit vectors u and v such that Xv = ||X|lu 
and X*u = ||X|lv; ie, u and v are left and right singular vectors of 
X corresponding to its largest singular value. Then (u,(AX — XB)v) = 
|X ||({u, Au) — (v, Bu)). Use this to prove Theorem VII.2.11 in the special 
case of the operator norm. 


Theorem VII.2.15 Let A and B_- be Hermitian operators 
with dist (o(A), o(B)) = 6 > 0. Then, the solution of the equation 
AX —-XB=Y satisfies the inequality 


XI <= = IY (VII.26) 


for every unitarily invariant norm, where c, is a positive real number de- 
fined as 


ce, = inf{||fl]n. : f ¢ L1(R), f(s) = : when |s| > 1}. (VII.27) 


Proof. Let fs be any function in L!(R) such that fs(s) = + whenever 
|s| > 6. By Theorem VII.2.5 we have 


X= / e VAY eB F(t) dt 
Hence, 


xi <I f \seeoiae= Suv freee 


where f(t) = fs(t/5). Note that f(s) = + whenever |s| > 1. Any f with 
this property satisfies the above inequality. | 


Exactly the same argument, using Theorem VII.2.6 now leads to the 
following. 


Theorem VII.2.16 Let A and B_ be normal operators with 
dist (0(A), o(B)) =6 > 0. Then the solution of the equation AX — XB = 
Y satisfies the inequality 


Xl < + lIY Il (VIT.28) 
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for every unitarily invariant norm, where 


co = inf{||f|lz. : f € L'(R®), f(si, 52) = 


when sj} +83 > 1}. 
(VITI.29) 


The exact evaluation of the constants c, and C2 1S an intricate problem 


in Fourier analysis. This is discussed in the Appendix at the end of this 
chapter. It is known that 


1 
Sy + 189 


TV 
cq = 
2 


sint 
<i a st it < 


Further, with this value of c;, the inequality (VII.26) is sharp. 


and 


VII.3 Perturbation of Eigenspaces 


Given a normal operator A and a subset S' of C, we will write P4(S) for 
the orthogonal projection onto the subspace spanned by the eigenvectors 
of A corresponding to those of its eigenvalues that lie in S. 

If S; and Sp» are two disjoint sets, and if E = Pa(S,) and F = Pa(S2), 
then & and F are mutually orthogonal. If A and B are two normal opera- 
tors, and if EF = P4(S;) and F = Pg(S2), then we might expect that if B 
is close to A and S; and Sy are far apart, then F is nearly orthogonal to 
F’, This is made precise in the theorems below. 


Theorem VII.3.1 Let A,B be normal operators. Let S; and S> be two 
subsets of the compler plane that are separated by either an annulus of 
width 6 or a strip of width 6. Let FE = Pa(S)), F = Pgp(S2). Then, for 


every unitarily invariant norm, 
1 1 
EPI < Sie(A — BF s sila — BI. (VII.30) 


Proof. Since E commutes with A and F' with B, the first inequality in 
(VII.30) can be written as 


EF < 5||ABF — BFBIl 


Now let EF = X. This is an operator from the space ran F' to the space 
ran &. Restricted to these spaces, the operators B and A have their spec- 
tra inside Sy and S,, respectively. Thus the above inequality follows from 
Theorem VII.2.11 when S; and S» are separated by an annulus, and from 
Theorem VII.2.12 when they are separated by a strip. 
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The second inequality in (VII.30) is true because ||E|| = ||F'||/=1. & 


The special case of this theorem when A and B are Hermitian, 5; is an 
interval [a,b] and Sz the complement in R of the interval (a — 6, b + 6), is 
known as the Davis-Kahan sinO Theorem. (We saw in Section 1 that 
|| EF'|| is the sine of the angle between ran & and ran F+.) 

With no special assumption on the way S; and S»2 are separated, we can 
derive the following two theorems from Theorems VII.2.15 and VII.2.16 by 
the argument used above. 


Theorem VII.3.2 Let A and B be Hermitian operators, and let S;, S2 be 
any two subsets of R such that dist (S,,S2) = 6 > 0. Let E = Pa(S1), 
F = Pp(S2). Then, for every unitarily invariant norm, 


IEF I] < SI|B(A ~ B)FI| < SIA - Bll, (VIL.31) 


where c, is the constant defined by (VII.27). (We know that c; = 3.) 


Theorem VII.3.3 Let A and B be normal operators, and let S),S_ be 
any two subsets of the complex plane such that dist (S,,S2) = 6 > 0. Let 
E = Pa(S1), F = Pp(So). Then, for every unitarily invariant norm, 


C C 
EF || < 2|.2(4 - B)FI| < SIA — Bi, (VI.32) 
where C2 ts the constant defined by (VII.29). (We know that co < 2.91.) 


Finally, note that for the Frobenius norm alone, we have a stronger result 
as a consequence of Theorem VII.2.8. 


Theorem VII.3.4 Let A and B be normal operators and let S,,S> be 


any two subsets of the complex plane such that dist (S,,S2) = 6 > 0. Let 
E= Pa(S}), F= Pp(S2). Then 


i ] 
|EFll2 < ZIE(A~ B)Plla < =| — Blo 


VIL4 A Perturbation Bound for Eigenvalues 
An important corollary of Theorem VII.3.3 is the following bound for the 
distance between the eigenvalues of two normal matrices. 


Theorem VII.4.1 There exists a constant c,1 < c < 3, such that the 


optimal matching distance d(a(A), o(B)) between the eigenvalues of any 
two normal matrices A and B is bounded as 


d(o(A), o(B)) <el|[A— BI. (VII.33) 
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Proof. We will show that the inequality (VII.33) is true if c = co, the 
constant in the inequality (VII.32). 

Let 7 = c2||A — B|| and suppose d(o(A), o(B)) > 7. Then we can find a 
6 > 7 such that d(o(A), o(B)) > 6. By the Marriage Theorem, this is 
possible if and only if there exists a set S, consisting of k eigenvalues of 
A,1<k <n, such that the 6-neighbourhood {z : dist(z,S,) < 6} contains 
less than k eigenvalues of B. Let Sp be the set of all eigenvalues of B outside 
this neighbourhood. Then dist($), 52) > 6. Let E = Pa(S,), F = Pg(S2). 
Then the dimension of the range of E is k, and that of the range of F is 


at least n — k +1. Hence ||EF'|| = 1. On the other hand, the inequality 
(VII.32) implies that 


Cc 
|EF || < F|A-Bl=2 <1. 


This is a contradiction. So the inequality (VII.33) is valid if we choose 
C = Co(< 2.91). 

Example VI.3.13 shows that any constant c for which the inequality 
(VII.33) is valid for all normal matrices A, B must be largerthan1.018. 


We should remark that, for Hermitian matrices, this reasoning using 
Theorem VII.3.2 will give the inequality d(o(A), o(B)) < $||A — BI. 
However, in this case, we have the stronger inequality d(o(A), o(B)) < 
|| A— B||. So, this may not be the best method of deriving spectral variation 


bounds. However, for normal matrices, nothing more effective has been 
found yet. 


VII.5 Perturbation of the Polar Factors 


Let A = UP be the polar decomposition of A. The positive part P in this 
decomposition is P = |A| = (A*A)!/? and is always unique. The unitary 
part U is unique if A is invertible. Then U = AP™!. 

It is of interest to know how a change in A affects its polar factors U and 
P. Some results on this are proved below. 

Let A and B be invertible operators with polar decompositions A = UP 
and B = VQ, respectively, where U and V are unitary, and P and Q are 
positive. Then, 


|A — Bl = UP — VQ|} = ||P - U°VQ)| 
for every unitarily invariant norm. By symmetry, 
A — Bll = ]Q — VU Pil. 


Let 
Y=P-U*VQ, Z=Q-VWUP. 
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Then 

Y+Z* =P(I—-U*V)+(1-U*V)Q. (VII.34) 

This equation is of the form studied in Section VII.2. Note that o(P) is 

a subset of the real line bounded below by s,(A) = ||A7'||7* and o(Q) 

is a subset of the real line bounded below by s,(B) = ||B~*||~'. Hence, 
dist (o(P), o(—Q)) = s,(A) + 8,(B). Hence, by Theorem VII.2.11, 
1 

8n(A) + sn(B) 


Since |||¥|l| = ||| Zl = || A — Bll and |Z - U*V ||| = ||U — Vl], this gives the 
following theorem. 


|Z -U*VI|l s IY + 2". 


Theorem VII.5.1 Let A and B be invertible operators, and let U,V be 
the unitary factors in their polar decompositions. Then 


2 
IU -Vil <$ Gaya ea (4 - Bill (VII.35) 
HOS Ta +B 
for every unitarily invariant norm. ||| - |||. 
Exercise VII.5.2 Find matrices A,B for which (VII.35) is an equality. 


Exercise VII.5.3 Let A, B be invertible operators. Show that 


2m 
WAI-IBL< (1+ pa ea) IA Bil, (VIL36) 


where m = min(|| Al, || Bll). 


For the Frobenius norm alone, a simpler inequality can be obtained as 
shown below. 


Lemma VII.5.4 Let f be a Lipschitz continuous function on C satisfying 
the inequality 


f(z) — f(w)| < klz- wi, forall z,weEC. 
Then, for all matrices X and all normal matrices A, we have 


IF(A)X — Xf(A)ll2 < kI|AX — XAlle. 


Proof. Assume, without loss of generality, that A = diag (A1,---,An)- 
Then, if X is any matrix with entries Liz, we have 


\f(A)X — X f(A)|I? DAI FO) — FAs) Ixag? 


2 Soli = Agl? [aig ? 
i,j 


= k*\|AX —XAll2. 


IA 
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Lemma VII.5.5 Let f be a function satisfying the conditions of Lemma 
VII.5.4. Let A,B be any two normal matrices. Then, for every matrix X, 
F(A)X — Xf(B)l2 < kl|AX — X Bll. 


Proof. Let T = (4 a) Y= (0 4). Replace A and X in Lemma VIL.5.4 
by T’ and Y, respectively. 


| 
Corollary VII.5.6 Jf A and B are normal matrices, then 
| |A] — |B] |l2 < |] A — Bile. (VII.37) 
Theorem VII.5.7 Let A,B be any two matrices. Then 
| [A] — |B] lo + I] |A*| — |B*| 13 < 2|.4 — By. (VII.38) 


Proof. Let T =(,. “), S=(>. é). Then T and S are Hermitian. Note 
that |T| = (4 | ay): So, the inequality (VII.38) follows from (VII.37). 


It follows from (VII.38) that 


|| |A] — [Bl |l2 < V2 ||A — Bile. (VII.39) 


The next example shows that the Lipschitz constant /2 in the above in- 
equality cannot be replaced by a smaller number. 


Example VII.5.8 Let 
_f 1 0 _f lie 
4=(0 0): =(0 0): 


t= aeale @) 


Ase—0, the ratio Lee approaches V2. 


Then |A| = A and 


We will continue the study of perturbation of the function |A| in later 
chapters. 

A useful consequence of Theorem VII.5.7 is the following perturbation 
bound for singular vectors. 


Theorem VII.5.9 Let S;,S2 be two subsets of the positive half-line such 
that dist(S,,S2) = 6 > 0. Let A and B be any two matrices. Let E and 
E’ be the orthogonal projections onto the subspaces spanned by the right 
and the left singular vectors of A corresponding to its singular values in 
S,. Let F and F’ be the projections associated with B in the same way, 
corresponding to its singular values in So. Then 


/2 


(|EF||2 + ||E'F'll2)' < GWA W Blle. (VII.40) 
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Proof. By Theorem VII.3.4 we have 


IA 


1 
|EF 2 < IIAl~IBl Ik, 


|E’F"||2 


IA 


1 
—|| |A*| —|B*| |lo. 
=| 1A*| — |B ll 


These inequalities, together with (VII.38), lead to the inequality (VII.40). ™ 


VII.6 Appendix: Evaluating the (Fourier) 
constants 


The analysis in Section VII.2 has led to some extremal problems in Fourier 
analysis. Here we indicate how the constants c; and cz defined by (VII.27) 
and (VII.29) may be evaluated. 

The symbol ||f||; will denote the norm in the space L! for functions 
defined on R or on R?. 

We are required to find a function f in L!, with minimal norm, such 
that f(s) = + when |s| > 1. Since f must be continuous, we might begin 
by taking a continuous function that coincides with - for |s| > 1 and 
then taking f to be its inverse Fourier transform. The difficulty is that the 
function is not in L', and hence its inverse Fourier transform may not 
be defined. Note, however, that the function is square integrable at oo. 
So it is the Fourier transform of an L? function. We will show that under 
suitable conditions its inverse Fourier transform is in L', and find one that 
has the least norm. 

Since the function t is an odd function, it would seem economical to 
extend it inside the domain (—1,1), so that the extended function is an 
odd function on R. This is indeed so. Let f € L1(IR) and suppose f(s) = 1 


when |s| > 1. Let foaq be the odd part of f, foaa(t) = fey fet) Then 
foaa(s) = + when |s| > 1 and ||foaalla < || fl]: Thus the constant c, is also 


the infimum of || ||; over all odd functions in L1 for which f(s) = + when 
|s| > 1. 


Now note that if f is odd, then 


f(s) = | feat = —1 / f(t) sints dt 
= -1 / Re f(t) sints dt + / Im f(t) sints dt. 


If this is to be equal to + when |s| > 1, the Fourier transform of Ref should 
have its support in (—1,1). Thus, it is enough to consider purely imaginary 
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functions f in our extremal problem. Pauivalently, let C be the class of all 


odd real functions in L'(IR) such that f(s) = 4 + when |s| > 1. Then 
c, = inf{||f]l1: f € C1, (VII.41) 
Now, let g be any bounded function with period 27 having a Fourier 
27 
series expansion Scone’. This last condition means that J of g(t)dt = 0. 
nA0 
Then, for any f in C, 
An ; 
/ f(t)g(a — t)dt = — e", (VIT.42) 
: mo 


Note that this expression does not depend on the choice of f. 
For a real number z, let sgnaz be defined as -1 if z is negative and 1 if x 
is nonnegative. Let fp be an element of C such that 


sen fo(t) = sgn sint. (VIT.43) 


Note that the function sgn fp then satisfies the requirements made on g in 
the preceding paragraph. Hence, we have 


| | fo(t)|dt 


[ 1 )sgn fo(t) -- | so )sgn fo(—t)dt 


— f seosen fot-nat sf pcoya 


for every f € C. (Use (VII.42) with x = 0 and see the remark following it.) 
Thus c; = || foll1, where fo is any function in C satisfying (VII.43). We will 
now exhibit such a function. 

We have remarked earlier that it is natural to obtain fo as the inverse 
Fourier transform of a continuous odd function y such that y(s) = * for 
|s| > 1. First we must find a good sufficient condition on y so that its 
inverse Fourier transform 


y(s) sints ds 


& 
| 
ai 
= 
| 
8 -— 2 


(which, by definition, is in L*) is in L'. Suppose y is differentiable on 
the whole real line and its derivative y’ is of bounded variation. Then, an 
integration by parts shows 


OO CO 


1 
/ y(s) sints ds = ; / cos ts y'(s)ds. 


—oo —9o 
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Another integration by parts, this time for Riemann-Stieltjes integrals, 
shows that 


CO 1 CO 
/ v(s) sints ds = 2 / sints dy’(s). 


As t — oo this decays as on So ¢(t) is integrable at oo. Since ¢ is in L?”, it 
is integrable over any bounded interval. Hence, ¢ is in L’. 

We will now find a function Yo that satisfies the conditions of the above 
paragraph, and show that if fo = Go, then fo satisfies the condition (VII.43). 
One such function is 


1/s for |s| > 1 
Yo(s) = ' —Fcot $s for0<|s|<1 (VII.44) 
0 for s = 0. 


From the familiar series expansion 


T T 1 ~ 22 
3 Ot 52=5+) ae 


n=1 
(see L.V. Ahlfors, Complex Analysis, 2nd ed., p. 188) one sees that 


CO 


pols) => aes for O0<s<l. 
n=1 

This shows that yo is a convex function in 0 < s < 1, and hence yf is 

of bounded variation in this domain. On the rest of the positive half-line 

too yp is of bounded variation. So yg does meet the conditions that are 

sufficient to ensure that fo = Gp is in C. 


Using the definition of wo, it is straightforward to verify that for t > 0, 


1 
2fo(t) = 1~ | cot = 8 sin ts ds, 
0 
2fo(t)—2fo(t+7) = sint | sin(t +7) 
t t+ 
fo(t) — fo(t +27) = ge -aata| sin t. 
2 t+m 2(t+ 27) 


The quantity inside the brackets is positive for all t. Since fo(t) — 0 as 
t — oo, we can write 


CO 


S [ole + 2nm) — fo(t + (2n+1)z)] sint 


n=0 


= h(t) sint, 


fo(t) 
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where h(t) > 0. This shows that fo satisfies the condition (VII.43). 


Finally, || fo|l1 can be evaluated from the available data. We can easily 
see that 


; 4 1. 1 
sgn sint = 7 (sint + 5 sin 3t + = sin dt + ---) 


Hence, using (VII.42) we obtain 


/ |fo(t)|at 


_ / fo(t) sen fo(—t)dt 


2 1 T 


We have shown that c, = 5: This result, and its proof given above, are 
due to B. Sz.-Nagy. 

The two-variable problem, through which cg is defined, is more com- 
plicated. The exact value of cp is not known. We will show that co is fi- 
nite by showing that there does exist a function f in L'(R?) such that 
f(s1, 82) = sphis; When s? + s3 > 1. We will then sketch an argument that 
leads to an estimate of co, skipping the technical details. 

It is convenient to identify a point (z, y) in R? with the complex variable 


z=x2+1y. The differential operator @ = 5 (Z +2 z) annihilates every 
complex holomorphic function. It is a well-known fact (see, e.g., W. Rudin, 
Functional Analysis, p. 205) that the Fourier transform of the tempered 
distribution : is — 2m (The normalisations we have chosen are different 
from those of Rudin.) 

Let y be a C™ function on R? that vanishes in a neighbourhood of the 
origin, and is 1 outside another neighbourhood of the origin. Let w(z) = 


ete) We will show that the inverse Fourier transform w is in L!. Note that 


dip(z) 
dz 


n(z) = —v(z) = : 


This is a C™ function with compact support. Hence, 7 is in the Schwartz 
space S. Let 7 € S be its inverse Fourier transform. Then (ignoring constant 


factors) w(z) = 7#(z)/z. Since 77 is integrable at oo, so is ~. At the origin, 
: is integrable and 77(z) bounded. Hence w is integrable at the origin. 
This shows that co < oo. 


Consider the tempered distribution fo(z) = s+. We know that fo(€) = 


2T1z 


. However, fo ¢ L'. To fix it up we seek an element p in the space of 
tempered distributions S’ such that 


m|r 
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(i) p € L} and supp # is contained in the unit disk D, 
(ii) if f = fo +p, then f is in [i . 


Note that c2 = inf ||f||, over such p. 


Writing z = re’’, one sees that 
fla = + f nar / E — ie” 2np(z)|d0. 
27 r 
0 —T 
Let : 
P(r) = | ie p(z)ao. (VI.45) 
Then 


7 | 1 
/ Es — ie? 2np(z)|dd > an|— — Fi(r)l, 
r 


—TT 


and there is equality here if e*’p(re*’) is independent of 6. Hence, we can 
restrict attention to only those p that satisfy the additional condition 


(iii) zp(z) is a radial function. 


Putting 
G(r) =1-rFi(r), (VIT.46) 
we see that so 
C2 = inf [\e@lar, (VII.47) 
0 


where G is defined via (VII.45) and (VII.46) for all p that satisfy the 


conditions (i), (ii), and (iii) above. The two-variable minimisation problem 
is thus reduced to a one-variable problem. 


Using the conditions on p, one can characterise the functions G that 
enter here. This involves a little more intricate analysis, which we will 
skip. The conclusion is that the functions G that enter in (VII.47) are all 
L* functions of the form G = g, where g is a continuous even function 


1 
supported in [—1,1] such that [ g(t)dt = 1. In other words, 
“| 


Co = int | |g(t)|dt : g even, supp g = [—1, uf 9 =1,g9¢€L'}. (VIL48) 
0 


If we choose g to be the function g(t) = 1 — |t|, then g(t) = sin*(4)(4)?. 
This gives the estimate cz < 7. 
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A better estimate is obtained from the function 
1 1 
t) = — cos—t for |t!] < 1. 
g(t) ri 5 jt} < 


Then 


A little computation shows that 


Tv 


. int 

[alae = > / — dt < 2.90901. 
0 0 

Thus cpg < 2.91. 


The interested reader may find the details in the paper An extremal 
problem in Fourier analysis with applications to operator theory, by R. 
Bhatia, C. Davis, and P. Koosis, J. Functional Analysis, 82 (1989) 138-150. 


VII.7 Problems 


Problem VII.6.1. Let € be any subspace of C”. For any vector 2 let 
d(2,€) = mi — y||. 
(v,€) = min lla — yl 
Then 6(z, €) is equal to ||(J — E)z||. If €, F are two subspaces of C”, let 


p(E, F) = max{ max 6(a,F), max 6(y,F)}. 


Il x ||=1 Hy 21 


Let dim € = dim F, and let © be the angle operator between € and F. 
Show that 


p(E, F) = || sin Ol] = || — Fl, 


where & and F are the orthogonal projections onto the spaces € and F. 


Problem VII.6.2. Let A,B be operators whose spectra are disjoint from 


each other. Show that the operator (4 ©) is similar to (4 ?) for every C. 


Problem VII.6.3. Let A,B be operators whose spectra are disjoint from 
each other. Show that if C’ commutes with A+ B and with AB, then C 
commutes with both A and B. 


Problem VII.6.4. The equation AX +X A* = —I is called the Lyapunov 
equation. Show that if 0(A) is contained in the open left half-plane, then 
the Lyapunov equation has a unique solution X , and this solution is positive 
and invertible. 
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Problem VII.6.5. Let A and B be any two matrices. Suppose that all 
singular values of A are at a distance greater than 6 from any singular value 
of B. Show that for every X, 


|Xllo <3 (les — XBI|3 + |A"X eit)” 
,< (eer ees dee ee | 
é 2 


Problem VII.6.6. Let A,B be normal operators. Let S; and S_ be two 
subsets of the complex plane separated by a strip of width 6. Let EB = 
P,(S1), F = Pgp(S2). Suppose E(A — B)E = 0. If T(t) is the function 
T(t) = t/vV1 — t?, show that 


IT(EF|)|| < =|E(A—B)I. 


Prove that this inequality is also valid for all unitarily invariant norms. 
This is called the tanO theorem. 


Problem VII.6.7. Show that the inequality (VII.28) cannot be true if 


C2 < 4. (Hint: Choose the trace norm and find suitable unitary matrices 
A, B.) 


Problem VII.6.8. Show that the conclusion of Theorem VII.2.8 cannot 
be true for any Schatten p-norm if p F 2. 


Problem VII.6.9. Let A, B be unitary matrices, and let dist (o(A), o(B)) 
= 6 = 2. Ifsome eigenvalue of A is at distance greater than \/2 from o(B), 
then o(A) and o(B) can be separated by a strip of width /2. In this case, 
the solution of AX — XB = Y can be obtained from Theorem VII.2.3. 
Assume that all points of (A) are at distance V2 from all points of o(B). 
Show that the solution one obtains using Theorem VII.2.7 in this case is 


X=1/2A'Y—-1/4YB+1/4A“YB"!. 
If o(A) = {1,—1} and o(B) = {i, —2}, this reduces to 


X =1/2(AY -YB). 


Problem VII.6.10. A reformulation of the Sylvester equation in terms 
of tensor products is outlined below. Let y be the natural isomorphism 
between the Hilbert spaces H@H* and L(H) constructed in Exercise I.4.4. 
Show that for every operator A and for each Bij, 
PASI (Ei) = AE; 
p(I ® A)p~*(Ei;) 


ve 


Ey A", 
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where A? is the transpose of A. 

Thus the multiplication operator A(X) = AX on L(H) can be identified 
with the operator A @ J on H @ H*, and the operator B(X) = XB can 
be identified with I @ B?. The operator T = A — B then corresponds to 
A®I-I@B". 

Use this to give another proof of Theorem VII.2.1. 

Sometimes it is more convenient to identify C(H) with H @ H instead of 
H ® H*. In this case, we have a bijection y from H @ H onto L(H), that 
is linear in the first variable and conjugate-linear in the second. With this 
identification, the operator A on £(H) corresponds to the operator A @ IJ, 
while the operator B corresponds to J @ B*. 
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due to M. Rosenblum, On the operator equation BX — XA = Q, Duke 
Math. J., 23 (1956) 263-270. Much of the rest of Section VII.2 is based on 
the paper by R. Bhatia, C. Davis and A. McIntosh, Perturbation of spectral 
subspaces and solution of linear operator equations, Linear Algebra Appl., 
52/53 (1983) 45-67. 

For Hermitian operators, Theorem VII.3.1 was proved in the Davis- 
Kahan paper cited above. This is very well known among numerical an- 
alysts as the sin? theorem and has been used frequently by them. This 
paper also contains a tan@ theorem (see Problem VII.6.6) as well as sin26 
and tan26 theorems, all for Hermitian operators. Theorem VII.3.4 (for Her- 
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In Theorem VII.2.5, all that is required of f is that f(s) = * when 


s € o(A) — o(B). This fixes the value of f only at n? points if we are 


dealing with n x n matrices. For each n, let b(n) be the smallest constant, 
for which we have 
b(n) 


|X|] < “5 JAX — XB, 


whenever A, B aren x n Hermitian matrices such that dist (o(A), o(B)) = 
6. R. McEachin has shown that 


6 
b(2) = ve = 1.22474 (see Example VII.2.10) 
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8+ 5710 
b(3) = LEPVIL ~ 1.32985 
18 
and that z 
b= lim b(n) = * 


Thus the inequality (VII.26) is sharp with c,; = 5- see, R. McEachin, 
A sharp estimate in an operator inequality, Proc. Amer. Math. Soc., 115 
(1992) 161-165 and Analyzing specific cases of an operator inequality, Linear 
Algebra Appl., 208/209 (1994) 343-365. 

The quantity p(€, F) defined in Problem VII.6.1 is sometimes called the 
gap between € and F. This and related measures of the distance between 
two subspaces of a Banach space are used extensively by T. Kato, Pertur- 
bation Theory for Linear Operators, Chapter 4. 


Vill 


Spectral Variation of Nonnormal 
Matrices 


In Chapter 6 we saw that if A and B are both Hermitian or both unitary, 
then the optimal matching distance d(a(A), o(B)) is bounded by ||A— Bl. 
We also saw that for arbitrary normal matrices A, B this need not always 
be true (Example VI.3.13). However, in this case, we do have a slightly 
weaker inequality d(o(A), o(B)) < 3||A— Bl] (Theorem VII.4.1). If one of 
the matrices A, B is Hermitian and the other is arbitrary, then we can only 
have an inequality of the form d(a(A), o(B)) < c(n)||A — Bl], where c(n) 
is a constant that grows like logn (Problems VI.8.8 and VI.8.9). 

A more striking change of behaviour takes place if no restriction is placed 
on either A or B. Let A be the nxn nilpotent upper Jordan matrix; i.e., the 
matrix that has all entries 1 on its first diagonal above the main diagonal 
and all other entries 0. Let B be the matrix obtained from A by adding 
an entry € in the bottom left corner. Then the eigenvalues of B are the 
nth roots of €. So d(a(A), o(B)) =e'/”, whereas ||A — B|| = c. When ¢ is 
small, the quantity e!/” is much larger. No inequality like d(a(A), o(B)) < 
c(n)||A — B|| can be true in this case. 

In this chapter we will obtain bounds for d(a(A), o(B)), where A,B 
are arbitrary matrices. These bounds are much weaker than the ones for 
normal matrices. We will also obtain stronger results for matrices that are 
not normal but have some other special properties. 
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Throughout this section, A and B will be two n x n matrices with eigenval- 


UeS 1,...,QAn, and (1,..., B,, respectively. In Section VI.3, we introduced 
the notation 


s(o(B), o(A)) = max minja; — G;|. (VIII.1) 
9 a 
A bound for this number is given in the following theorem. 
Theorem VIII.1.1 Let A,B be nxn matrices. Then 
s(o(B), o(A)) < (|Al| + |] BI)" A — By”. (VIIT.2) 
Proof. Let 7 be the index for which the maximum in the definition 
(VIII.1) is attained. Choose an orthonormal basis €1,-..,€n such that 


Be, = Byer. Then 


[s(o(B), o(A))]” 


[minja,; — @;|]” 


nr 


< |] la: — 6;| = |det(A — 6;1)| 
1=1 
< |\(A-BTerl|---|\(A— BDen| 


by Hadamard’s inequality (Exercise 1.1.3). The first factor on the right- 
hand side of the above inequality can be written as ||(A — B)e;|| and is, 


therefore, bounded by || A—B]|. The remaining n—1 factors can be bounded 
as 


(A — By L)exl] < ||Aexll + 1651 < |All + [Bl], for & = 2,3,...,n. 


This is adequate to derive (VIII.2). | 


Example VIII.1.2 Let A= —B=TI. Then the two sides of (VIII.2) are 
equal. 


Compare this theorem with Theorem VI.3.3. 
Since the right-hand side of (VIII.2) is symmetric in A and B, we have 
a bound for the Hausdorff distance as well: 


h(o(A), o(B)) < (|All + |BI)i-/" 4 — Bye”. (VIIL.3) 


Exercise VIII.1.3 A bound for the optimal matching distance 
d(o(A), o(B)) can be derived from Theorem VIII.1.1. The argument is 
similar to the one used in Problem VI.8.6 and is outlined below. 

(1) Fiz A, and for any B let 


e(B) = (2M)'"/"||A — BI'/”, 
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where M = max(||All, ||Bll). Let a1,...,n be. the eigenvalues of A. Let 
D(a;, €(B)) be the closed disk with radius e(B) and centre a;. Then, The- 
orem VIII.1.1 says that o(B) is contained in the set D obtained by taking 
the union of these disks. 

(ii) Let A(t) = (1-t)A+tB, 0<t<1. Then A(O) = A, A(1) = B, and 
e(A(t)) < e(B) for allt. Thus, for eachO <t <1, o(A(t)) ts contained in 
D. 


(itt) Since the n eigenvalues of A(t) are continuous functions of t, each 
connected component of D contains as many eigenvalues of B as of A. 
(iv) Use the Matching Theorem to show that this implies 


d(o(A), o(B)) < (2n—1)(2M)'-V"||A — BY". (VIII.4) 


(uv) Interchange the roles of A and B and use the result of Problem II.5.10 
to obtain the stronger inequality 


d(o(A), o(B)) < n(2M)'-V/™ A — BYP”. (VII.5) 


The example given in the introduction shows that the exponent 1/n oc- 
curring on the right-hand side of (VIII.5) is necessary. But then homogene- 
ity considerations require the insertion of another factor like (2M)!~/™. 
However, the first factor n on the right-hand side of (VIII.5) can be replaced 
by a much smaller constant factor. This is shown in the next theorem. We 
will use a classical result of Chebyshev used frequently in approximation 
theory: if p is any monic polynomial of degree n, then 


1 


0<t<1 


(This can be found in standard texts such as P. Henrici, Elements of Nu- 
merical Analysis, Wiley, 1964, p. 194; T.J. Rivlin, An Introduction to the 


Approximation of Functions, Dover, 1981, p. 31.) The following lemma is 
a generalisation of this inequality. 


Lemma VIITI.1.4 Let T be a continuous curve in the complex plane with 
endpoints a and b. If p is any monic polynomial of degree n, then 


|b — al” 
ven IP(A)| = “pan—1 (VIII.7) 


Proof. Let L be the straight line through a and b and S the segment of 
L between a and b: 


L = {z:z=a+t(b—a), te R} 
S = {z:z=a+t(b-a), 0<t<1}. 


For every point z in C, let z’ denote its orthogonal projection onto L. Then 
|z — w| > |z’ — w’| for all z and w. 
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Let Ai, 7 = 1,...,n, be the roots of p. Let xX, = a+t,;(b— a), where 
t; € IR, and let z =a+t(b— a) be any point on L. Then 


[Tz ~ X11 = []i¢- 4) ~ «| = foal" [Jet 


From this and the inequality (VIII.6) applied to the polynomial [[¢ —t;), 


i=1 
we can conclude that there exists a point z) on S for which 


nr 


[Ii - x= Boel 


92n—1 


t=1 


Since [ is a continuous curve Joining a and b, 2 = XG for some Ao € TL. 
Since |Ag — A;| > |Ag — A4|, we have shown that there exists a point Ao on 
Tr 
b—a|” 
I’ such that Ip(Ao)| = [[)o - Aj = | | 


J2n—1 ° 
=1 a 


Theorem VIII.1.5 Let A and B be two n x n matrices. Then 
d(o(A), o(B)) < 4(||A]] + |] Bl])2-7/"|.4 — By”. (VIIT.8) 


Proof. Let A(t) = (1—t)A+tB, 0 < t < 1. The eigenvalues of A(t) 
trace n continuous curves in the plane as t changes from 0 to 1. The initial 
points of these curves are the eigenvalues of A, and their final points are 
those of B. So, to prove (VIIL.8) it suffices to show that if I is one of these 
curves and a and 6 are the endpoints of I’, then |a — b| is bounded by the 
right-hand side of (VIII.8). 

Assume that ||A|| < ||B|| without any loss of generality. By Lemma 
VIII.1.4, there exists a point A9 on I such that 


|b — a” 


g2n—1 ° 


| det(A _ Aol)| = 


Choose 0 < t < 1 such that Ag is an eigenvalue of (1 — t9)A + toB. In 
the proof of Theorem VIII.1.1 we have seen that if X,Y are any twon xn 
matrices and if A is an eigenvalue of Y, then 


| det(X — AT)| < |X — Y I(x + IY)". 


Choose X = A and Y = (1 —t))A+toB. This gives boat < |det(A — 
Aol)| < ||A — BI (|Al] + |] Bl])"~*. Taking nth roots, we obtain the desired 
conclusion. = 


Note that we have, in fact, shown that the factor 4 in the inequality 
(VIII.8) can be replaced by the smaller number 4 x 27!/”. A further im- 
provement is possible; see the Notes at the end of the chapter. However, 
the best possible inequality of this type is not yet known. 
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VIIL.2 Perturbation of Roots of Polynomials 


The ideas used above also lead to bounds for the distance between the roots 
of two polynomials. This is discussed below. 


Lemma VIII.2.1 Let f(z) =z" +a12" ' +--++n be any monic poly- 
nomial. Let 
w= 2 max ja,|'/*. (VIII.9) 


l<k<n 


Then all the roots of f are bounded (in absolute value) by wu. 


Proof. If |z| > yu, then 


a An 
F(z) — +2444 
Zz” z zm 
> 1- |= -|5 _... (a 
z z zm 
SP d od 
2 2? 2” 
> 0. 
Such z cannot, therefore, be a root of f. a 
Let Q1,...,Q@n be the roots of a monic polynomial f. We will denote by 
Root f the unordered n-tuple {a 1,...,@n,} as well as the subset of the plane 


whose elements are the roots of f. We wish to find bounds for the optimal 
matching distance d(Root f, Root g) in terms of the distance between the 
coefficients of two monic polynomials f and g. Let 


f(z) — 2” +a,2"71 +-++4+ an, 
g(z) = 24 by2P-1L 4... 4b, (VIII.10) 
be two polynomials. Let 
_ 1/k 1/k 

y= 2 max max(|ap| , [be /*), (VIII.11) 
n 1/n 

O(f,g) = {Soe — nin} ; (VIIT.12) 

k=1 


The bounds given below are in terms of these quantities. 


Theorem VIITI.2.2 Let f,g be two monic polynomials as in (VIII.10). 
Then 


s(Root f, Root g) < S lax — by|p™-*, (VIII.13) 
k=1 


where ps 1s given by (VIII.9). 
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Proof. We have 


Tm 


f(z) — 9(z) = > (ag — by )2"*. 


k=1 


So, if a is any root of f, then, by Lemma VIII.2.1, 


T™ 
Ig(a)| << Solan — by] lol?-* 
k=1 
n 
< So lag — bg] u"-*. 


k=1 


If the roots of g are (j,..., Bn, this says that 


n n 
[Lia - 6) < Slax — bela’. 
So, 


n 1/n 
minja — §;| < {Solo — nest . 


k=1 
This proves the theorem. 
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Corollary VIII.2.3 The Hausdorff distance between the roots of f and g 


is bounded as 


h(Root f, Rootg) < O(f, g). (VIII.14) 
Theorem VIII.2.4 The optimal matching distance between the roots of f 


and g is bounded as 


d(Root f, Root g) < 4 O(f,g). (VITI.15) 


Proof. The argument is similar to that used in proving Theorem VIII.1.5. 
Let fp = (1—-t)f+tg,0 <t <1. If isa root of f,, then by Lemma VIIL.2.1, 


|A| < y and we have 


IF(A)| 


leCF(A) — 9A) S IFO) — 9) 


Tl 


> lax — bel |AI"~* < [O(F,9)]”. 


k=1 


IA 


The roots of f, trace n continuous curves as t changes from 0 to 1. The 
initial points of these curves are the roots of f, and the final points are the 
roots of g. Let I’ be any one of these curves, and let a,b be its endpoints. 


Then, by Lemma VIII.1.4, there exists a point A on I such that 


ja = 8)" 
FO Sa 
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This shows that 
lja—b| <4x27'/" O(f, 9), 


and that is enough for proving the theorem. a 


Exercise VIII.2.5 Leth =(f+g)/2. Then any convex combination of f 
and g can also be expressed as h + t(f — g) for some t with |t| < 3. Use 
this to show that 


d(Root f, Root g) < 4!~/”" O(f, g). (VIII.16) 


The first factor in the above inequality can be reduced further; see the 
Notes at the end of the chapter. However, it is not known what the optimal 
value of this factor is. It is known that no constant smaller than 2 can 
replace this factor if the inequality is to be valid for all degrees n. 

Note that the only property of yy used in the proof is that it is an upper 
bound for all the roots of the polynomial (1 —t)f +tg. Any other constant 
with this property could be used instead. 


Exercise VIII.2.6 In Problem I.6.11, a bound for the distance between 
the coefficients of the characteristic polynomials of two matrices was ob- 
tained. Use that and the combinatorial identity 


y«(2) =n 
k=0 
to show that for any two n x n matrices A, B 
d(o(A), o(B)) < n/"(8M)1-¥/"| 4 — By”, (VIII.17) 


where M = max(|All, ||B\|). This ts weaker than the bound obtained in 
Theorem VII. 1.5. 


VIIL3 Diagonalisable Matrices 


A matrix A is said to be diagonalisable if it is similar to a diagonal 
matrix; 1.e., if there exists an invertible matrix S and a diagonal matrix D 
such that A = SDS". This is equivalent to saying that there are n linearly 
independent vectors in C” that are eigenvectors for A. If S is unitary (or the 
eigenvectors of A orthonormal), A is normal. In this section we will derive 
some perturbation bounds for diagonalisable matrices. These are natural 
generalisations of some results obtained for normal matrices in Chapter 6. 
The condition number of an invertible matrix S is defined as 


cond(S') = ||S|| ||S~"|]. 


Note that cond(S) > 1, and cond(S) = 1 if and only if S is a scalar multiple 
of a unitary matrix. 


Our first theorem is a generalisation of Theorem VI.3.3. 
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Theorem VIII.3.1 Let A= SDS~!, where D is a diagonal matrix and 
S an invertible matrix. Then, for any matriz B, 


s(o(B),0(A)) < cond(S)||A — BI. (VIIL.18) 


Proof. The proof of Theorem VI.3.3 can be modified to give a proof of 
this. Let ¢ = cond(S)||A — B||. We want to show that if @ is any eigenvalue 
of B, then @ is within a distance e€ of some eigenvalue a; of A. By applying 
a translation, we may assume that @ = 0. If none of the a; is within a 
distance e of this, then A is invertible and 


|A“1]] = ||SD-28—], << cond(S)|| D7] < COBalS) 


1 


—_— 


|A — Bll 
So, 
|A~*(B— A)|| < |A7|| |B - Al <1. 
Hence, J + A~*(B — A) is invertible, and so is B = A(I + A~}(B — A)). 


But then B could not have had a zero eigenvalue. a 


Note that the properties of the operator norm used above are (i) J+A is 
invertible if || Al] < 1; (ii) || ABl| < || Al] || Bl for all A, B; (iii) || D|] = max |d,| 
if D = diag(d;,...,d,). There are several other norms that satisfy these 
properties. For example, norms induced by the p-norms on C”, 1 < p < on, 
all have these three properties. So, the inequality (VIII.18) is true for a 
large class of norms. 


Exercise VIII.3.2 Using continuity arguments and the Matching Theo- 
rem show that if A and B are as in Theorem VIII.3.1, then 


a(o(A),0(B)) < (2n ~ 1)cond($)||A — Bll 
If B is also diagonalisable and B = TD'T~', then 
d(a(A),o(B)) < ncond(S)cond(T)||A — Bl. 


An inequality stronger than this will be proved below by other means. 


Theorem VIII.3.1 also follows from the following theorem. Both of these 
are called the Bauer-Fike Theorems. 


Theorem VIII.3.3 Let S be an invertible matriz. If G is an eigenvalue 
of B but not of A, then 


|S(A— BL) TS" < ||S(A- B)S™ |. (VITI.19) 
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Proof. Wecan write 


S(B — BI)s7 S\(A— BI) + B- AS" 


S(A— BI)S-*{1 + S(A—BI)7'S7'- $(B— A)S“}. 


Note that the matrix on the left-hand side of this equation is not invertible. 
Since A — GI is invertible, the matrix inside the braces on the right-hand 
side is not invertible. Hence, 


1 ||S(A — B1)7*S7" - 8(B— A)S™| 


< 
< ||S(A—6I)-*S~*| \|S(B- A)S™*]. 
This proves the theorem. a 


We now obtain, for diagonalisable matrices, analogues of some of the 
major perturbation bounds derived in earlier chapters for Hermitian ma- 
trices and for normal matrices. Some auxiliary theorems about norms of 
commutators are proved first. 


Theorem VIII.3.4 Let A,B be Hermitian operators, and let T be a DOS- 
ttive operator whose smallest eigenvalue is y (i.e., T > yI > 0). Then 


||AP - TBI] > y||A — Bil (VIIT.20) 
for every unitarily invariant norm. 
Proof. Let T = AT —TB, and let Y = T+ T*. Then 
=(A-— B)T+TI(A-B). 


This is the Sylvester equation that we studied in Chapter 7. From Theorem 
VII.2.12, we get 


2yI|A — Bll < II¥ || < 2I|T I] = 2] Ar - rBiy. 


This proves the theorem. a 


Corollary VIII.3.5 Let A, B be any two operators, and let T be a positive 
operator, > yI >0. Then 


(AP — TB) @ (A*P —1B*)|| > yII|(A- B)@(A-B)|| — (VIL.21) 
for every unitarily invariant norm. 


Proof. This follows from (VIII.20) applied to the Hermitian operators 


(,. 4) and (5. ¢), and the positive operator (4 2): a 
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Corollary VIII.3.6 Let A and B be unitary operators, and let T be a 
positive operator, I’ > yI > 0. Then, for every unitarily invariant norm, 


||AP — TBI] > yII|A — Bl. (VIII.22) 
Proof. If A and B are unitary, then 
s;(AI —TB) = s;((B* — A*T) = s;(A*l —TB*). 


Thus the operator (AT —IT'B) @ (A*I —T'B*) has the same singular values 
as those of AI‘— TB, each counted twice. From this we see that (VIII.21) 


implies (VIII.22) for all Ky Fan norms, and hence for all unitarily invariant 
norms. a 


Corollary VIII.3.7 Let A,B be normal operators, and let be a positive 
operator, > yI >0. Then 


| AP —PBll2 2 y|A — Bile. (VIII.23) 


Proof. Suppose that A is normal and its eigenvalues are qj,... ,Q,. Then 
(choosing an orthonormal basis in which A is diagonal) one sees that for 
every X 


JAX — XAll2 =) clas — a5 |?|wis/? = JA*X — XA*|2. 
24) 
If A, B are normal, then, applying this to (4 ») in place of A and (0 * ) 
in place of X, one obtains 
|| AX — XBllo = ||A*X — XB* Ilo. (VIII.24) 
Using this, the inequality (VIII.23) can be derived from (VIII.21). = 


A famous theorem (called the Fuglede-Putnam Theorem, valid in 
Hilbert spaces of finite or infinite dimensions) says that if A and B are 
normal, then for any operator X, AX = XB if and only if A*X = XB*. 
The equality (VIII.24) says much more than this. 


Example VIII.3.8 For normal A,B, the inequality (VUI.23) is not al- 
ways true if the Hilbert-Schmidt norm is replaced by the operator norm. A 
numerical erample illustrating this is given below. Let 


[ = diag(0.6384, 0.6384, 1.0000), 


—0.5205 — 0.16422 0.1042 — 0.36182 —0.1326 — 0.02602 
A= {| —0.1299+0.17092 0.4218+ 0.46852 —0.5692 —0.31782 | , 
0.2850 — 0.18082 —0.3850 — 0.42572 + —0.2973 — 0.17152 


236 VIII. Spectral Variation of Nonnormal Matrices 


—0.6040 + 0.17602 0.5128 —0.2865i 0.1306 + 0.01542 
B= 0.0582 + 0.28507 0.01544 0.44972 —0.5001 — 0.28332 
0.4081 — 0.33332 —0.0721 — 0.25457 —0.2686 + 0.02471 
Then 
|| AC —TB|| 
y|A — Bll 
Theorem VIII.3.4 and Corollary VIII.3.7 are used in the proofs below. 
Alternate proofs of both these results are sketched in the problems. These 
proofs do not draw on the results in Chapter 7. 


= 0.8763. 


Theorem VIII.3.9 Let A,B be any two matrices such that A=SD,S“", 
B=TD2T—!, where S,T are invertible matrices and D,, D2 are real diag- 
onal matrices. Then 


Eig (A) — Eig!(B)|l| < [cond($)cond(T)]'/?\|A — Bll] (VIII.25) 


for every unitarily invariant norm. 


Proof. When A, B are Hermitian, this has already been proved; see (IV.62). 
This special case will be used to prove the general result. 
We can write 


A-—B=SD,S"' —~TDeT™ = (DST — S7!TD2)TH. 
Hence, 
I|D1S~°T — ST Dag|l| = |\|S~*(A — B)T | < ||S71 | |A — BUI ITI. 
We could also write 
A-B=T(T ‘SD, — DoT™'S)s™ 


and get 
|Z SD, — D2T~* S| < ||T~*]| |]. A — Bhll {ISI 
Let S~'T have the singular value decomposition S~!'T = UTV. Then 


|D1S~'T —S“TDal| = || D\UTV — ULV Dgl| 
[U" DiUl —TPVD2V* || = ||A'T — TBI, 


where A’ = U*D,U and B’ = VD.V* are Hermitian matrices. Note that 
T~'S = V*T—!U*. So, by the same argument, 


|P~" SD, — DgT~* S$} = ||P A! — BIT? II. 
We have, thus, two inequalities 


aA — Bi > ||AT-TB' I, 
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and 
B\|A — Bl = \|AT-* —T-*B' I, 


where a = |[S~*|| ||T||, 8 = ||T~1]] ||S1]. Combining these two inequalities 
and using the triangle inequality, we have 


(vr Port) 
aA Bi =i (2 +) - (2+) ot 


—1 


1/21? 
The operator inequality (5) _ (55+) | = 0 implies that Cy r > 


B 
Taya . Hence, by Theorem VIII.3.4, 


2 
2\||A — Bll| > ———|l|A’ — B’ II. 
| | = CTSWE | | 
But A’ and B’ are Hermitian matrices with the same eigenvalues as those 


of A and B, respectively. Hence, by the result for Hermitian matrices that 
was mentioned at the beginning, 


| A” — B’]| > |I|Eig’(A) — Big! (B)|I. 
Combining the three inequalities above leads to (VIIT.25). a 


Theorem VIII.3.10 Let A,B be any two matrices such that 


A=SD,S~', B=TD2T—!, where S,T are invertible matrices and D,, Do 
are diagonal matrices. Then 


d2(o(A), o(B)) < [cond($)cond(T)]'/?||A — Bila. (VIII.26) 


Proof. When A,B are normal, this is just the Hoffman-Wielandt in- 
equality; see (VI.34). The general case can be obtained from this using 
the inequality (VITI.23). The argument is the same as in the proof of the 
preceding theorem. a 


Theorems VIII.3.9 and VIII.3.10 do reduce to the ones proved earlier 
for Hermitian and normal matrices. However, neither of them gives tight 
bounds. Even in the favourable case when A and B commute, the left-hand 
side of (VIII.24) is generally smaller than ||A — Bll, and this is aggravated 
further by introducing the condition number coefficients. 


Exercise VIII.3.11 Let A and B be as in Theorem VIII.3.10. Suppose 
that all eigenvalues of A and B have modulus 1. Show that 


dy. (0(A), (B)) < ; [cond($)cond(T)]"/?||A — Bi (VIII.27) 


for all unitarily invariant norms. For the special case of the operator norm, 
the factor 5 above can be replaced by 1. 


[Hint: Use Corollary VIII.3.6 and the theorems on unitary matrices in 
Chapter 6./ 
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VIIIL4 Matrices with Real Eigenvalues 


In this section we will consider a collection R of matrices that has two 
special properties: R is a real vector space and every element of ® has only 
real eigenvalues. The set of all Hermitian matrices is an example of such a 
collection. Another example is given below. Such families of matrices arise 
in the study of vectorial hyperbolic differential equations. The behaviour of 
the eigenvalues of such a family has some similarities to that of Hermitian 
matrices. This is studied below. 


Example VIII.4.1 Fiz a block decomposition of matrices in which all di- 
agonal blocks are square. Let R be the set of all matrices that are block 
upper triangular in this decomposition and whose diagonal blocks are Her- 
mitian. Then R is a real vector space (of real dimension n*) and every 
element of R has real eigenvalues. 


In this book we have called a matrix positive if it is Hermitian and all 
its eigenvalues are nonnegative. A matrix A will be called laxly positive 
if all eigenvalues of A are nonnegative. This will be written symbolically as 
0 <¥ A. If all eigenvalues of A are positive, we will say A is strictly laxly 
positive. We say A <” B if B— A is laxly positive. 

We will see below that if R is a real vector space of matrices each of 
which has only real eigenvalues, then the laxly positive elements form a 
convex cone in R. So, the order <” defines a partial order on R. 

Given two matrices A and B, we say that \ is an eigenvalue of A with 
respect to B if there exists a nonzero vector x such that Ar = ABz. 
Thus, eigenvalues of A with respect to B are the n roots of the equation 
det(A — AB) = 0. These are also called generalised eigenvalues. 


Lemma VIII.4.2 Let A,B be two matrices such that every real linear 
combination of A and B has real eigenvalues. Suppose B is strictly laaly 


positive. Then for every real 4, -A + AI has real eigenvalues with respect 
to B. 


Proof. We have to show that for any real \ the equation 


det(—A + AI — uB) = 0 (VIII.28) 
is satisfied by n real p. 


Let ps be any given real number. Then, by hypothesis, there exist n real 
A that satisfy (VIII.28), namely the eigenvalues of A+ wB. Denote these 
as pj (4) and arrange them so that yi (1) > yo(u) > --- > Yn(p). We have 


det(—A + AI — pB) = Tle — pr(H)). (VIII.29) 


By the results of Section VI.1, each y;,() is continuous as a function of 
u. For large p, w(A + uB) is close to B. So, Pk(H) approaches M(B) as 
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[JL — oo, and M(B) as fs — —oo. Since B is strictly laxly positive, this 
implies that y, (41) + -koo as pp — +boo. 

So, every \ in R is in the range of y, for each k = 1,2,...,n. Thus, for 
each A, there exist n real ys that satisfy (VIII.28). | 


Proposition VIII.4.3 Let A, B be two matrices such that every real linear 
combination of A and B has real eigenvalues. Suppose A is (strictly) lazly 
negative. Then every eigenvalue of A+iB has (strictly) negative real part. 


Proof. Let = 1 + ip2 be an eigenvalue of A+iB. Then det (A+iB- 
iI — tp2I) = 0. Multiply this by i” to get 


det[(—B + pol) +i(A — yw D)] =0. 


So the matrix —B + ji2I has an eigenvalue —i with respect to the matrix 
A — I, and it has an eigenvalue i with respect to the matrix —(A — 11/). 

By hypothesis, every real linear combination of A — fic and B has real 
eigenvalues. Hence, by Lemma VIII.4.2, A — iJ cannot be either strictly 
laxly positive or strictly laxly negative. In other words, 


An(A) S pa < AY(A). 
This proves the proposition. a 


Exercise VIII.4.4 With notations as in the above proof, show that 
AL(B) < pe < A}(B). 


Theorem VITI.4.5 Let R be a real vector space whose elements are ma- 
trices with real eigenvalues. Let A,B € R and let A <” B. Then d;,(A) < 
i (B) fork =1,2,...,n. 


Proof. We will prove a more general statement: if A,B € R and 0 <# B, 
then AM (A + 1B) is a monotonically increasing function of the real variable 
yp. It is enough to prove this when 0 <” B; the general case follows by con- 
tinuity. In the notation of Lemma VIII.4.2, AN (A + uB) = vr(p). Suppose 
vr (w) decreases in some interval. Then we can choose a real number X such 
that A —yx() increases from a negative to a positive value in this interval. 
Since px() — too as ys > too, for this value of A, A — yx (ps) vanishes 
for at least three values of yz. So, in the representation (VIII.29) this factor 
contributes at least three zeroes. The remaining factors contribute at least 
one zero each. So, for this A, the equation (VIII.28) has at least n +2 roots 
ut. This is impossible. a 


Theorem VIII.4.6 Let R be a real vector space whose elements are ma- 
trices with real eigenvalues. Let A,B ER. Then 


M.A) + AL (B) < AL(A + B) < AL(A) + At (B) (VIII.30) 
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fork =1,2,...,n. 


Proof. The matrix B—}(B)I is laxly positive. So, by the argument in 
the proof of the preceding theorem, Nj (A + pB) — pdAt(B) is a monotoni- 
cally increasing function of 2. Choose p = 0,1 to get the first inequality in 
(VIII.30). The same argument shows that M (A+B) —pAt(B) is a mono- 
tonically decreasing function of . This leads to the second inequality. a 


Corollary VIII.4.7 On the vector space R, the function \t (A) is convex 
and the function A+(A) is concave in the argument A. 


Theorem VIII.4.8 Let A and B be two matrices such that all real linear 
combinations of A and B have real eigenvalues. Then 


max |\t(A) — \4(B)| < spr(A — B) < ||A— BI]. (VIIL31) 


1<k<n 


Proof. Let R be the real vector space generated by A and B. By Theorem 
VITI.4.6, 


A (A) + AL(B— A) < AX (B) < AE(A) + AL(B — A). 
So, 


x 


JA,(B) —Ag(A)| << max(|A{(B — A)|, |A4(B — A)|) 
= spr(A—B) <||A- BI. . 


Note that Weyl’s Perturbation Theorem is included in this as a special 
case. 


Exercise VIII.4.9 Show that if only A,B and A+B are assumed to have 
real eigenvalues, then the inequality (VII.31) might not be true. 


VIIL5 Eigenvalues with Symmetries 


We have remarked earlier that the exponent 1/n occurring in the bound 
(VIII.8) is unavoidable. However, if A and B are restricted to some special 
classes, this can be improved. In this section we identify some useful classes 
of matrices where this exponent can be improved (though not eliminated 
altogether). These are matrices whose eigenvalues appear as pairs + or, 
more generally, as tuples {A,wA,...,w? tA}, where w is a pth root of unity. 
We will give interesting examples of large classes of such matrices, and then 


show how this symmetric distribution of their eigenvalues can be exploited 
to get better bounds. 


VIIL5 Eigenvalues with Symmetries 241 


Example VIII.5.1 Let A? denote the transpose of a matrix. A complex 
matriz is called symmetric if AT = A andskew-symmetric if AT = —A. 
If A is a skew-symmetric matriz, then » is an eigenvalue of A if and only 
if —A is. The class of all such matrices forms a Lie algebra. This is the Lie 
algebra associated with the complex orthogonal group. 


Example VIII.5.2 If A’ is similar to —A, then, clearly, X is an eigen- 
value of A if and only if —r is. The Lie algebra corresponding to the sym- 
plectic Lie group contains matrices that have this property. Let n be an even 
number n = 2r. Let J = (5 5): where I is the identity matrix of order r. 
Let A be ann xn matriz such that AT = —JAJ~—!. It is easy to see that 


we can then write 
A __ Ay Ao 
— \ As —A?t 


where A,, Az, A3 arerxr matrices of which Az and A3 are skew-symmetric. 


The collection of all such matrices is the Lie algebra associated with the 
symplectic group. 


Example VIII.5.3 Let X be a matrix of order n = pr having a special 
form 


O A, O O --: 0 
O O Ag O 0 
X=P ane 

0 OO O QO ++ Ape 

A, 0 O0O O -:-: 9) 
where Aj,...,Ap are matrices of orderr. Let Y = diag(I,,wI,,...,w?1I,), 
where w is the primitive pth root of unity. Then Y-1|XY =wX. So, if X is 
an eigenvalue of X, then so are wr, wdr,...,wP7!X. 


Exercise VIII.5.4 Let Z = ( A A), and suppose R commutes with A,. 


Show that tr Z* = 0 if k is odd. Use this to show that X is an eigenvalue of 
Z if and only if —A is. 


Exercise VIII.5.5 Let w be the primitive pth root of unity. If X,Y are 
two matrices such that XY =wYX, then (X+Y)? = X?+Y°. 


Exercise VIII.5.6 Let Z be a matriz of order n = pr having a special 


form 
R A, O O ::: 0 
0 wR Ap O ::: ) 
Ven ee _ 
ea] 
A, 0 Q vee eee we? R 
where R commutes with A,, Ao,..., Ap. Use the result of the preceding ex- 


ercise to show that tr Z* = 0 if k is not an integral multiple of p. Use this 
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to show that if X is an eigenvalue of Z, then so are wA,wA,--- ,wP-1, 
(This is true even when R commutes with Aj,...,Ap—1-) 


For brevity, an n-tuple will be called p-Carrollian if n = pr and the 
elements of the tuple can be enumerated as 


~1 ~1 
(Q1,---)Qp,WO1,...,WAp,...,W?"a),...,W? “a,), (VIII.32) 


where w is the primitive pth root of unity. We have seen above several 
examples of matrices whose eigenvalues are p-Carrollian. 


Exercise VIII.5.7 Let s,,1 < k < n denote the elementary symmetric 
polynomials in n variables. If (a,,...,Qn) is a Carrollian n-tuple written 
in the form (VIII.32), show that modulo a sign factor, we have 


_ f sj(az,...,aP) ifk= jp 
se(@1ys4@n) = { 0 ifk # jp. 


Use this to show that if a1,...,Q@n are roots of the polynomial 
f(z) = 2" +ay2z"* +---+ an, 
then at,...,a? are roots of the polynomial 
F(z) = 27+ Apz"" + A2p2"* + +++ + Grp. 
Proposition VIII.5.8 Let f,g be monic polynomials of degree n as in 


(VIII.10). Suppose n = pr and the roots of f and g both are p-Carrollian. 
Let y be as in (VIII.11). Then the roots of f and g can be labelled as 


Q1,---,Qn and B,..., 8, in such a way that 
r l/r 
max lay — B7| < 4 {Solon - barren | , (VIII.33) 
k=1 
Proof. Use Theorem VIII.2.4 and Exercise VIII.5.7. a 


Theorem VIII.5.9 Let n= pr and let A, B be twon x n matrices whose 
eigenvalues are p-Carrollian. Then 


d(a(A?), 0(B?)) <4¢,5 M?-/"|A — BY", (VIII.34) 


where M = max(||All, || Bil) and 


T 1/r 
r 
Crp = {So C )| (VIII.35) 
k=1 
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Proof. See Exercise VIII.2.6 and use the preceding proposition. a 


The two results above give bounds not on the distance between the roots 
themselves but on that between their pth powers. If all of them are outside 
a neighbourhood of zero, a bound on the distance between the roots can 
be obtained from this. This needs the following lemma. 


Lemma VIII.5.10 Let x,y be complez numbers such that |x| > p, ly| > p 
and |xP — y?| < C. Then, for some k, O<k <p— 1, 


lc —w*y| < 


—' (VIII.36) 
where w is the primitive pth root of unity. 


Proof. Compare the coefficients of t in the identity 


p—1 
[lt — (x —w*y)] = (—1)?[(x — t)? — y?| 
k=0 
to see that 
Sp-a(t —y, © —wy,...,2— why) = (=1)P lpg? 1, 


The right-hand side has modulus larger than pp?! and the left-hand side 
is a sum of p terms. Hence, at least one of them should have modulus larger 
than p?~'. So, there exists k, 0 < k < p—1, such that 


p—1 
[ [2-7] = pP. 


jxk 
j7=0 
p—1 
But [iz —wiy| = |x? — y?| < C. This proves the lemma . 
: a 
j=0 


When p = 2, the inequality (VIII.36) can be strengthened. To see this 
note that 


lc — yl + |e + yl? = 2(\x|? + |yl?) > 4p”. 


So, either |x — y| or |x + y| must be larger than 2!/?,. Consequently, one 
of them must be smaller than C'/2!/2p. 

Thus if the eigenvalues of A and B are p-Carrollian and all have modulus 
larger than p, then d(a(A), o(B)) is bounded by C/p?—', where C is the 
quantity on the right-hand side of (VIII.34). When p = 2, this bound 
can be improved further to C//2p. The major improvement over bounds 
obtained in Section 2 is that now the bounds involve ||A — B||!/ instead of 
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|| A— B||'/”. For low values of p, the factors c,,, can be evaluated explicitly. 
For example, we have the combinatorial identity 


S 2k (3) —n2"-* ifn = 2r. 
k=0 


VIII.6 Problems 


Problem VIII.6.1. Let f(z) = 2” +a,2z""!+--++ a, be a monic poly- 
nomial. Let p1,..., fn be the numbers |a,|!/*, 1 < k < n, rearranged 
in decreasing order. Show that all the roots of f are bounded (in absolute 
value) by 41+ [2. This is an improvement on the result of Lemma VIII.2.1. 


Problem VIII.6.2. Fill in the details in the following alternate proof of 
Theorem VIII.3.1. 


Let @ be an eigenvalue of B but not of A. If Bx = Bx, then 
z= $(GI —D)'S"'(B— A)z. 


Hence, 
||z|| < cond(S)||B — Al] ||(Gl—D)~*|| lal]. 
From this it follows that 


min|@ — a;| < cond(S)||B — All. 
J 


Notice that this proof too relies only on those properties of || - || that are 
shared by many other norms (like the ones induced by the p-norms on C”). 
See the remark following Theorem VIII.3.1. 


Problem VIII.6.3. Let B be any matrix with entries b,;. The disks 
D; = {z : |z — b: | < S-|bis |; 1 < 1 < Nn, 
j#t 


are called the Gersgorin disks of B. The Gersgorin Disk Theorem 
says that 


o(B) C | J Di, 
2=1 


and that any connected component of the set JD: contains as many eigen- 


values of B as the number of disks that form this component. 
The proof of this is outlined below. 
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Consider the vector norm ||z||,o = max |z;| on C”. The norm it induces 
<i<n 


on operators is 
|Alloo+00 = jax = |ais|. 


Let D be the diagonal of B, and let H = B— D. Let @ be an eigenvalue 
of B but not of D. Then 


6I-B=6I-H-D=(§I—- D)[I -(6I — D)~1H}. 


Since GJ — B is not invertible, neither is the matrix in the square brackets. 
Hence, 

1 < ||(6I — D)* A000. 
From this, the first part of the theorem follows. The second part follows 
from the continuity argument we have used often. Let B(t) = D+tH, 0< 


t<1. Then B(O) = D, B(1) = B; the eigenvalues of B(t) trace continuous 
curves that join the eigenvalues of D to those of B. 


Note that the proof of the first part is very similar to that of Theorem 
VIII.3.3; in fact, it is a special case of the earlier one. 


Problem VIII.6.4. Given any matrix A, we can find a unitary U such 
that 


U*AU=T=D++N, 
where T is upper triangular, D is diagonal, and N is strictly upper trian- 
gular and, hence, nilpotent. Such a reduction is not unique. The measure 
of nonnormality of A is defined as 
A(A) = inf |], 


where the infimum is taken over all N that occur in the possible triangular 
forms of A given above. 


Now let B be any other matrix, and let G be an eigenvalue of B but not 
of A. From (VIII.19) we have 


|(D —BI+N)“*|\"* <||A- BI. 
Show that 
(D-6I+N)* = [[+(D-6I)tN]-*'(D-BI)™ 
[I —(D — BI) *N + {(D- BI) tN} 
$e + (=D {(D = BIN} F(D — B17. 


Let 6 = dist(G,o(A)). From this equation and the inequality before it 
conclude that 


\A— BI" <p ey (AA) bas ey}. 
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Now show that 


This is Henrici’s Theorem. 

Let f(t) =t"/(1+t+---+t"—'). Then f(t) is close to t” for small values 
of t, and to ¢ for large values of t. Thus, when A(A) is close to 0, ice., 
when A is close to being normal, the above bound leads to the asymptotic 
inequality 

s(o(B), o(A)) < ||A- Bl. 


In Theorem VI.3.3 we saw that if A is normal, then s(o(B), o(A)) < 
|A — Bl]. 


Problem VIII.6.5. Let v be any norm on the space of matrices. The 
y-measure of nonnormality of A is defined as 


A,(A) =inf v(N), 
where WN is as in Problem VIII.6.4. Suppose that the norm v is such that 
|| Al] < v(A) for all A. Show that A(A) in Henrici’s Theorem can be replaced 
by A,(A). 


Problem VIII.6.6. For the Hilbert-Schmidt norm || - ll2, the measure of 
nonnormality satisfies the inequality 


n—n 
12 


1/4 
Ao(A) < ( ) |A*A — AA*|I5/? 


for every n X n matrix A. (The proof is a little intricate. ) 


Problem VITII.6.7. Let A have the Jordan canonical form J = SAS}, 
Let m be the size of the largest Jordan block in J. Let B be any other 
matrix. Show that for every eigenvalue G of B there is an eigenvalue a of 


A such that 
IG —al™ a 
—_—————— < ||S(A— B)S 


Problem VIII.6.8. Let A,B,I be as in Theorem VITII.3.4. 
Let (A — B)z; = A;x;, where the vectors x; are orthonormal and the 
eigenvalues \; are indexed in such a way that 8; := 8;(A— B) = |A,|. Let 
yj be the orthonormal vectors that satisfy the relations (A — B)x; = s5y;. 
Note that y; = ta2,;. Note also that the difference of AT-['B and (A—B)P 
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is skew-Hermitian. Use this to show that, for1<k <n, 


k k 
Re ) (ej, (AP —'B)y;) Re Di (a3, (A — B)Ty;) 


ys, (yj, Cy3) = rs) 
j=l j=1 


Use this to give an alternate proof of Theorem VIII.3.4. ( See Problem 
ITI.6.6.) 


Problem VITII.6.9. Fill in the details in the following proof of Corollary 
VIII.3.7. Let D=T — 7I. Then 


|| AP —T Bl 


(AD — DB) + y(A— B)|l2 
| AD — DB, + 7° ||A — Bll 
+27 Re tr (AD — DB)*(A-— B). 


So, it suffices to show that the last term is positive. This can be seen by 
writing 


2 Re tr (AD—DB)*(A-—B) = tr {((AD—DB)*(A—-B)+(A-—B)*(AD—DB)} 
and then using cyclicity of the trace to reduce this to 


tr D((A— B)*(A— B) + (A— B)(A-—B)*]. 


Problem VIII.6.10. (i) Let X be a contractive matrix; i.e., let || X|| < 1. 
Show that there exist unitaries U and V such that X = $(U+V). Use this 
to show that if D, and Do are real diagonal matrices, then 


|| D1X — X Dall < ||Dr — Dall 


for every unitarily invariant norm. [(See (IV.62).] 
(ii) Let A = SD,S~', B = TD2T~', where S and T are invertible 
matrices and Dj, D2 are real diagonal matrices. Show that 


|| A — Bll < cond($)cond(T)||Eig' (A) — Eig'(B)|l. 
Problem VIII.6.11. Let A and B be any two diagonalisable matrices with 
eigenvalues \1,..., An and [4j,..., fn, respectively. Let A = SD,S', B= 


TD2T~'!, where S and T are invertible matrices and D,, D2 are diagonal 
matrices. Show that 


1/2 
| A — Bll2 < cond(S)cond(T’)max (sr — pal 
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where 7 varies over all permutations on n symbols. [See Theorem VI.4.1.] 


Problem VITI.6.12. Let A be a Hermitian matrix with eigenvalues 
Q1,...,Q@,. Let B be any other matrix. For 1 <j <n, let 


Dj = {z:|2— aj] < ||A— BI, [Im 2| < |[Im(A — B)||}. 


The regions D; are disks flattened on the top and bottom by horizontal 
lines. Show that the eigenvalues of B are contained in LJD;, and that each 


j 
connected component of this set contains as many eigenvalues of A as of B. 


Problem VITI.6.13. Let 7 be a real vector space whose elements are 
k 


matrices with real eigenvalues. Show that the function Sod; (A) is a convex 
j=l 
k 
function of A on this space for 1 < k < n. Show that the function SoM (A) 


j=1 
is concave on JR. 


Problem VIII.6.14. If R, is invertible, then 


R, A\_ (RR 0 I Ry A, 
Ag Ro Ap Rz—- A2RyT'A, 0 I 


Use this to show that if 
R Ay, 
Z= 
(4, 2k) 
and R commutes with A), then Z and —Z have the same eigenvalues. (Show 
that they have the same characteristic polynomials.) This gives another 
proof of the statement at the end of Exercise VIII.5.6, for p = 2. The same 


method works for p > 2. For instance, the case p = 3 is dealt with as 
follows. If R,, Re are invertible, then 


Ry Ay 0 

0 Ry Ag 

Az 0 Rs 

R, 0 0 I R,'A, 0 
=| 0 Ro 0 0 I Ry Ay 

Az; —A3Ry,' A; R3 + A3R,* A, Ry! Ae 0 0 I 


Derive similar factorisations for p > 3, and use this to prove the statement 
at the end of Exercise VIII.5.6. 
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VIII.7 Notes and References 


Many of the topics in this chapter have been presented earlier in R. Bhatia, 
Perturbation Bounds for Matriz Eigenvalues, Longman, 1987, and in G.W. 
Stewart and J.-G. Sun, Matriz Perturbation Theory, Academic Press, 1990. 
Some results that were proved after the publication of these books have, of 
course, been included here. 

The first major results on perturbation of roots of polynomials were 
proved by A. Ostrowski, Recherches sur €a méthode de Graffe et Les zeros 
des polynomes et des series de Laurent, Acta Math., 72 (1940), 99-257. See 
also Appendices A and B of his book Solution of Equations and Systems of 
Equations, Academic Press, 1960. Theorem VIII.2.2 is due to Ostrowski. 
Using this he proved an inequality weaker than (VIII.15); this had a factor 
(2n — 1) instead of 4. The argument used by him is the one followed in 
Exercise VIII.1.3. 

Ostrowski was also the first to derive perturbation bounds for eigenvalues 
of arbitrary matrices in his paper Uber die Stetigkeit von charakteristischen 
Wurzeln in Abhangigkeit von den Matrizenelementen, Jber. Deut. Mat. - 
Verein, 60 (1957) 40-42. See also Appendix K of his book cited above. 


The inequality he proved involved the matrix norm ||Aj|;, = + S laisl, 
24) 


which is easy to compute but is not unitarily invariant. With this norm, 
his inequality is like the one in (VIII.4). 

An inequality for d(o(A), o(B)) in terms of the unitarily invariant 
Hilbert-Schmidt norm was proved by R. Bhatia and K.K. Mukherjea, On 
the rate of change of spectra of operators, Linear Algebra Appl., 27 (1979) 
147-157. They followed the approach in Exercise VIII.2.6 and, after a little 
tidying up, their result looks like (VIII,4) but with the larger norm || - ||2 
instead of ||- ||. This approach was followed, to a greater success, in R. Bha- 
tia and S. Friedland, Variation of Grassmann powers and spectra, Linear 
Algebra Appl., 40 (1981) 1-18. In this paper, the norm || - || was used and 
an inequality slightly weaker than (VIII.4) was proved. 

An improvement of these inequalities in which (2n — 1) is replaced by 
nm was made by L. Elsner, On the variation of the spectra of matrices, 
Linear Algebra Appl., 47 (1982) 127-138. The major insightful observation 
was that the Matching Theorem does not exploit the symmetry between 
the polynomials f and g, nor the matrices A and B, under consideration. 
Theorem VIII.1.1 is also due to L. Elsner, An optimal bound for the spectral 
variation of two matrices, Linear Algebra Appl., 71 (1985) 77-80. 

The argument using Chebyshev polynomials, that we have employed in 
Sections VIII.1 and VIII.2, seems to have been first used by A. Schonhage, 
Quasi-GCD computations, J. Complexity, 1(1985) 118-137. (See Theorem 
2.7 of this paper.) It was discovered independently by D. Phillips, Improv- 
ing spectral variation bounds with Chebyshev polynomials, Linear Algebra 
Appl., 133 (1990) 165-173. Phillips proved a weaker inequality than (VIIL8) 
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with a factor 8 instead of 4. 

This argument was somewhat simplified and used again by R. Bhatia, 
L. Elsner, and G. Krause, Bounds for the variation of the roots of a poly- 
nomial and the eigenvalues of a matrix, Linear Algebra Appl., 142 (1990) 
195-209. Theorems VIII.1.5 and VIII.2.4 (and their proofs) have been 
taken from this paper. Using finer results from Chebyshev approximation, 
G. Krause has shown that the factor 4 occurring in these inequalities can 
be replaced by 3.08. See his paper Bounds for the variation of matrix eigen- 
values and polynomial roots, Linear Algebra Appl., 208/209 (1994) 73-82. 
It was shown by Bhatia, Elsner, and Krause in the paper cited above that, 
in the inequality (VIII.15), the factor 4 cannot be replaced by anything 
smaller than 2. 

Theorems VIII.3.1 and VIII.3.3 were proved in the very influential paper, 
F.L. Bauer and C.T. Fike, Norms and exclusion theorems, Numer. Math., 
2 (1960) 137-141. See the discussion in Stewart and Sun, p. 177. 

The basic idea behind results in Section VIII.3 from Theorem VIII.3.4 
onwards is due to W. Kahan, Inclusion theorems for clusters of eigenvalues 
of Hermitian matrices, Technical Report, Computer Science Department, 
University of Toronto, 1967. Theorem VIII.3.4 for the special case of the 
operator norm is proved in this report. The inequality (VIII.23) is due 
to J.-G. Sun, On the perturbation of the eigenvalues of a normal matrix, 
Math. Numer. Sinica, 6(1984) 334-336. The ideas of Kahan’s and Sun’s 
proofs are outlined in Problems VIII.6.8 and VIII.6.9. Theorem VIILI.3.4, 
in its generality, was proved in R. Bhatia, C. Davis, and F. Kittaneh, 
Some inequalities for commutators and an application to spectral variation, 
Aequationes Math., 41(1991) 70-78. The three corollaries were also proved 
there. These authors then used their commutator inequalities to derive 
weaker versions of Theorems VIII.3.9 and VIIT.3.10; in all these, the square 
root in the inequalities (VII.25) and (VIII.26) is missing. For the operator 
norm alone, the inequality (VIII.25) was proved by T.-X. Lu, Perturbation 
bounds for eigenvalues of symmetrizable matrices, Numerical Mathemat- 
ics: a Journal of Chinese Universities, 16(1994) 177-185 (in Chinese). The 
inequalities (VIII.25)-(VIII.27) have been proved recently by R. Bhatia, 
F’. Kittaneh and R.-C. Li, Some inequalities for commutators and an appli- 
cation to spectral variation IJ, Linear and Multilinear Algebra, to appear. 

The inequality in Problem VIII.6.10 was proved in R. Bhatia, 
L. Elsner, and G. Krause, Spectral variation bounds for diagonalisable ma- 
trices, Preprint 94-098, SFB 343, University of Bielefeld. Example VIII.3.8 
(and another example illustrating the same phenomenon for the trace 
norm) was constructed in this paper. The inequality in Problem VIII.6.11 
was found by L. Elsner and S. Friedland, Singular values, doubly stochastic 
matrices and applications, Linear Algebra Appl., 220(1995) 161-169. 

The results of Section VIII.4 were discovered by P. D. Lax, Differen- 
tial equations, difference equations and matrix theory, Comm. Pure Appl. 
Math., 11(1958) 175-194. Lax was motivated by the theory of linear partial 
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differential equations of hyperbolic type, and his proofs used techniques 
from this theory. The paper of Lax was followed by one by H.F. Wein- 
berger, Remarks on the preceding paper of Lax, Comm. Pure Appl. Math., 
11 (1958) 195-196. He gave simple matrix theoretic proofs of these theo- 
rems, which we have reproduced here. L. Garding later pointed out that 
these are special cases of his results for hyperbolic polynomials that ap- 
peared in his papers Linear hyperbolic partial differential equations with 
constant coefficients, Acta Math., 84(1951) 1-62, and An inequality for hy- 
perbolic polynomials, J. Math. Mech., 8(1959) 957-966. A characterisation of 
the kind of spaces R discussed in Section VIII.4 was given by H. Wielandt, 
Lineare Scharen von Matrizen mit reellen Eigenwerten, Math. Z., 53(1950) 
219-225. 

It was observed by R. Bhatia, On the rate of change of spectra of oper- 
ators II, Linear Algebra Appl., 36 (1981) 25-32, that better perturbation 
bounds can be obtained for matrices whose eigenvalues occur in pairs +4. 
This was carried further in the paper Symmetries and variation of spectra, 
Canadian J. Math., 44 (1992) 1155-1166, by R. Bhatia and L. Elsner, who 
considered matrices whose eigenvalues are p-Carrollian. See also the paper 
by R. Bhatia and L. Elsner, The q-binomial theorem and spectral symmetry, 
Indag. Math., N.S., 4(1993) 11-16. The material in Section VIII.5 is taken 
from these three papers. 

The bound in Problem VIII.6.1 is due to Lagrange. There are several 
interesting and useful bounds known for the roots of a polynomial. Since 
the roots of a polynomial are the eigenvalues of its companion matrix, 
some of these bounds can be proved by using bounds for eigenvalues. An 
interesting discussion may be found in Horn and Johnson, Matrix Analysis, 
pages 316-319. ; 

The Gersgorin Disk Theorem was proved in S.A. Gersgorin, Uber die 
Abrenzung der Figenwerte einer Matriz, Izv. Akad. Nauk SSSR, Ser. Fiz. - 
Mat., 6(1931) 749-754. A matrix is called diagonally dominant if |a;;| > 
Ss laigl, 1 <2<_n. Every diagonally dominant matrix is nonsingular. 
Gersgorin’s Theorem is a corollary. This theorem is applied to the study of 
several perturbation problems in J.H. Wilkinson, The Algebraic Eigenvalue 
Problem. A comprehensive discussion is also given in Horn and Johnson, 
Matrix Analysis. 

The results of Problems VIII.6.4, VIII.6.5, and VIII.6.6 are due to 
P. Henrici, Bounds for iterates, inverses, spectral variation and fields of 
values of nonnormal matrices, Numer. Math., 4 (1962) 24-39. Several other 
very interesting results that involve the measure of nonnormality are proved 
in this paper. For example, we know that the numerical range W(A) of a 
matrix A contains the convex hull H(A) of the eigenvalues of A, and that 
the two sets are equal if A is normal. Henrici gives a bound for the distance 
between the boundaries of H(A) and W(A) in terms of the measure of 
nonnormality of A. 
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There are several different ways to measure the nonnormality of a ma- 
trix. Problem VIII.6.6 relates two such measures by an inequality. The re- 
lations between several different measures of nonnormality are discussed in 
L. Elsner and M.H.C. Paardekooper, On measures of nonnormality of ma- 
trices, Linear Algebra Appl., 92(1987) 107-124. 

Is a nearly normal matrix near to an (exactly) normal matrix? More pre- 
cisely, for every € > 0, does there exist a 6 > 0 such that if || A*A—AA*|| < 6 
then there exists a normal B such that || A — B|| < «? The existence of such 
a 6 for each fixed dimension n was shown by C. Pearcy and A. Shields, Al- 
most commuting matrices, J. Funct. Anal., 33(1979) 332-338. The problem 
of finding a 6 depending only on € but not on the dimension n is linked 
to several important questions in the theory of operator algebras. This 
has been shown to have an affirmative solution in a recent paper: H. Lin, 
Almost commuting selfadjoint matrices and applications, preprint, 1995. 
No explicit formula for 6 is given in this paper. In an infinite-dimensional 
Hilbert space, the answer to this question is in the negative because of 
index obstructions. 

The inequality in Problem VIII.6.7 was proved in W. Kahan, B.N. Par- 
lett, and E. Jiang, Residual bounds on approximate eigensystems of non- 
normal matrices, SIAM J. Numer. Anal. 19(1982) 470-484. 

The inequality in Problem VIII.6.12 was proved by W. Kahan, Spectra 
of nearly Hermitian matrices, Proc. Amer. Math. Soc., 48(1975) 11-17. 

For 2 x 2 block-matrices, the idea of the argument in Problem VIII.6.14 
is due to M.D. Choi, Almost commuting matrices need not be nearly com- 
muting, Proc. Amer. Math. Soc. 102(1988) 529-533. This was extended to 
higher order block-matrices by R. Bhatia and L. Elsner, Symmetries and 
variation of spectra, cited above. 


LX 


A Selection of Matrix Inequalities 


In this chapter we will prove several inequalities for matrices. From the 
vast collection of such inequalities, we have selected a few that are simple 
and widely useful. Though they are of different kinds, their proofs have 
common ingredients already familiar to us from earlier chapters. 


IX.1 Some Basic Lemmas 


If A and B are any two matrices, then AB and BA have the same eigen- 
values. (See Exercise I.3.7.) Hence, if f(A) is any function on the space of 
matrices that depends only on the eigenvalues of A, then f(AB) = f(BA). 
Examples of such functions are the spectral radius, the trace, and the de- 
terminant. If A is normal, then the spectral radius spr(A) is equal to || All. 
Using this, we can prove the following two useful propositions. 


Proposition IX.1.1 Let A,B be any two matrices such that the product 
AB is normal. Then, for every unitarily invariant norm, we have 


W|ABll < I BAll- (IX.1) 


Proof. For the operator norm this is an easy consequence of the two facts 
mentioned above; we have 


| AB|| = spr(AB) = spr(BA) < ||BAll. 


The general case needs more argument. Since AB is normal, s;(AB) = 
\\;(AB)|, where |\;(AB)| > --- > |An(AB)| are the eigenvalues of AB 
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arranged in decreasing order of magnitude. But |\,;(AB)| = |A;(BA)|. By 
Weyl’s Majorant Theorem (Theorem II.3.6), the vector |A(BA)| is weakly 
majorised by the vector s(BA). Hence we have the weak majorisation 
s(AB) x, s(BA). From this the inequality (IX.1) follows. a 


Proposition IX.1.2 Let A,B be any two matrices such that the product 
AB is Hermitian. Then, for every unitarily invariant norm, we have 


||ABll| < ||Re(BA)|]. (IX.2) 


Proof. The eigenvalues of BA, being the same as the eigenvalues of the 
Hermitian matrix AB, are all real. So, by Proposition III.5.3, we have the 
majorisation \(BA) < (Re BA). From this we have the weak majorisation 
|A(BA)| <w |A(Re BA)|. (See Examples II.3.5.) The rest of the argument 
is the same as in the proof of the preceding proposition. a 


Some of the inequalities proved in this chapter involve the matrix expo- 
nential. An extremely useful device in proving such results is the following 
theorem. 


Theorem IX.1.3 (The Lie Product Formula) For any two matrices A, B, 


A B\™ 
li — — = 
im (exp — exp =) exp(A + B). (IX.3) 
Proof. For any two matrices X,Y, and for m = 1,2,..., we have 
m—1 
x™-ym— S 7 xm-l-d(x —y)ys, 
j=0 
Using this we obtain 
|X” —Y™ || <m Mx — YI], (IX.4) 
where M = max(||X||, ||Y||). 
Now let X,, = exp(448), Ym = exp = exp —- m= 1,2,.... Then 


|| Xm] and ||¥,n|| both are bounded above by exp Lats) From the 


power series expansion for the exponential function, we see that 
A+B 1/A+B\? 
m 2 m 
A 1/A\? 
—~4jl+—+=[(—] +--- 
m 2\m 


1 
O (a) for large m. 
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Hence, using the inequality (IX.4), we see that 


m m 1 
IX —Yenll <m exp(ll + B10 (5). 


This goes to zero as m — oo. But X7? = exp(A + B) for all m. Hence, 
lim Y,° = exp(A+ B). This proves the theorem. P| 


The reader should compare the inequality (IX.4) with the inequality in 
Problem I.6.11. 


Exercise IX.1.4 Show that for any two matrices A, B 


B A B\™ 
lim (exp am °XP 7 exP =) = exp(A + B). 


Exercise IX.1.5 Show that for any two matrices A, B 


tB B\\t 
lim (exp > eXP tA exp 5) =exp(A+ B). 


IX.2 Products of Positive Matrices 


In this section we prove some inequalities for the norm, the spectral radius, 
and the eigenvalues of the product of two positive matrices. 


Theorem IX.2.1 Let A,B be positive matrices. Then 
|A°B*|| < ||ABI’, for O<s<l. (IX.5) 
Proof. Let 
D={s:0<s<1, ||A*°B*| < || ABll*}. 


Then D is a closed subset of [0,1] and contains the points 0 and 1. So, to 
prove the theorem, it suffices to prove that if s and t are in D then so is 
srt We have 


s+t 


=) 


s+t BE |? s+t 


A |B Ast? BS || = spr(B? Astt B 
spr( B® Astt B') < || Be Astt B'|| 
|B°A®|| ||A' B= |A° Bel] AB‘. 


/\ 


At the first step we used the relation ||T||? = ||7*7'||, and at the last step 
the relation ||T*|| = ||T'|| for all T. If s,t are in D, this shows that 


JA Be || < JABISt)/, 
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and this proves the theorem. | 


An equivalent formulation of the above theorem is given below, with 
another proof that is illuminating. 


Theorem IX.2.2 Jf A, B are positive matrices with || AB|| < 1, then ||A®B*|| - 
1 forO<s< 1. 


Proof. We can assume that A > 0. The general case follows from this by 
a continuity argument. We then have the chain of implications 


ABI] <1 => ||AB?Al| <1> AB?A<I 
=> B? < A~? (by Lemma V.1.5) 
=> B** < A~*s (by Theorem V.1.9) 
= A*®B?s A’ <I (by Lemma V.1.5) 
=> ||A°B**As|| <1 => ||A°B*|| <1. 
a 
Another equivalent formulation is the following theorem. 
Theorem IX.2.3 Let A,B be positive matrices. Then 
JABI|’ <|A‘BI|, for t>1. (IX.6) 


Proof. From Theorem IX.2.1, we have ||A!/* B!/*|| < | AB||/¢ for t > 1. 
Replace A, B by A‘, B’, respectively. w 


Exercise [X.2.4 Let A,B be positive matrices. Then 
(i) \|A‘/* B‘/*\|* is a monotonically decreasing function of t on (0, 00). 
(ii) \|A*B*||'/* is a monotonically increasing function of t on (0,00). 


In Section 5 we will see that the inequalities (IX.5) and (IX.6) are, in 
fact, valid for all unitarily invariant norms. 

Results akin to the ones above can be proved for the spectral radius in 
place of the norm. This is done below. 

If A and B are positive, the eigenvalues of AB are positive. (They are 
the same as the eigenvalues of the positive matrix A!/?BA1/2.) If T is 
any matrix with positive eigenvalues, we will enumerate its eigenvalues as 


A(T) 2 rA2o(T) = ++» > An(T) > 0. Thus A;(T) is equal to the spectral 
radius spr (7). 


Theorem IX.2.5 If A,B are positive matrices with \1(AB) < 1, then 
Ai (AS B*) <1 for0<s <1. 
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Proof. As in the proof of Theorem IX.2.2, we can assume that A > 0. 
We then have the chain of implications 


M(AB) <1 = (A? BAY?) <1 > AV2BAV2 < J 
> B<At > B< AS & AS/?2BSAS/2 < |] 
=> A, (AS/? BS AS?) <1 => dy (A®BS) <1. 


This proves the theorem. L 


It should be noted that all implications in this proof and that of Theorem 
IX.2.2 are reversible with one exception: if A >= B> 0, then A* > B® for 
Q0<s <1, but the converse is not true. 


Theorem IX.2.6 Let A,B be positive matrices. Then 
Ai(A®B*) < AV(AB), for O0<s<1. (IX.7) 


Proof. Let A;(AB) = a”. Ifa ¥ 0, we have \i(4 2) = 1. So, by Theorem 
IX.2.5, A1(A®° B®) < a5 = \8(AB). 

If a = 0, we have \,(A!/?BA1/2) = 0, and hence A!/2BA!/2 = 0. From 
this it follows that the range of A is contained in the kernel of B. But then 
As/? Bs As/2 — 0, and hence, Ai (A% B®) = 0. = 


Exercise I[X.2.7 Let A, B be positive matrices. Show that 
i (AB) < A1(A*B*), for t>1. (IX.8) 

Exercise IX.2.8 Let A,B be positive matrices. Show that 
(i) [A1(A'/*B"/*)|* is a monotonically decreasing function of t on (0,00). 
(ii) [A (A‘B*)|'/* is @ monotonically increasing function of t on (0,00). 


Using familiar arguments involving antisymmetric tensor products, we 
can now obtain stronger results. 


Theorem [X.2.9 Let A, B be positive matrices. Then, for0 <t <u < ov, 
we have the weak majorisation 


M/E AEB) ~,, AVY (AY BY). (IX.9) 
Proof. For k = 1,2,...,n, consider the operators A*A and A*B. The 


result of Exercise [X.2.8(ii) applied to these operators in place of A,B 
yields the inequalities 


k k 
IT» di!" AB) < <I di!" (AY BY) (IX.10) 
fork = 1,2,...,n. The assertion of I1.3.5(vii) now leads to the majorisation 


(IX.9). a 
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Theorem IX.2.10 Let A, B be positive matrices. Then for every unitarily 

invariant norm we have 
|B’ AB" || 
I|(BAB)"|| 


I(BAB)*||, forO<t <1, (IX.11) 
|B’ A’ B* ||, fort > 1. (IX.12) 


Proof. We have 
| BYA‘B'|| = |\(A"/2BY)*(AY?B")|| = A“? BYP < AY?BI, 
for 0 <t< 1, by Theorem I[X.2.1. So 
|B’ A’ B*|| < ||BAB||’, forO<t<1. 
This is the same as saying that 
s;(B‘ A‘B*) < si (BAB). 


Replacing A and B by their antisymmetric tensor powers, we obtain, for 
l<k<n, 


k k 
[ [si(B'4'B’) < []s¢(BAB). 
j=l j=l 
By the argument used in the preceding theorem, this gives the majorisation 


s(B’ A‘ B") x, s([BAB]*), 


which gives the inequality (IX.11). 
The inequality (IX.12) is proved in exactly the same way. = 


Exercise IX.2.11 Derive (as a special case of the above theorem) the fol- 
lowing inequality of Araki-Lieb-Thirring. Let A, B be positive matrices, and 
let s,t be positive real numbers with t > 1. Then 


tr[(BY/? ABY/?)5*] < tr[(BY/2.4t Bt/2)8), (IX.13) 


IX.3 Inequalities for the Exponential Function 


For every complex number z, we have |e*| = |e®**|. Our first theorem is a 
matrix version of this. 


Theorem IX.3.1 Let A be any matrix. Then 


lle“IIl < fhe®e 4] (IX.14) 


for every unitarily invariant norm. 
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Proof. For each positive integer m, we have ||A™|| < ||A||. This is the 
same as saying that s?(A™) < 52™ (A) or s;(A*™A™) < si"(A*A). Replac- 
ing A by A*A, we obtain forl1 <<k<n 


k 


k 
[ocaran) < TI 
j=l 


Now, if we replace A by e4/™, we obtain 


k k 
Isi(¢ eA" ¢ )< Iss [eA /meA/mimy. 


Letting m — oo, and using the Lie Product Formula, we obtain 


k 
Tote) < [ste 


Taking square roots, we get 


j=1 j=l 


This gives the majorisation 
s(e*) xy s(e®°A) 
(see II.3.5(vii)), and hence the inequality (IX.14). | 


It is easy to construct an example of a 2 x 2 matrix A, for which |l|e4|| 
and |l|e®*4||| are not equal. 


Our next theorem is valid for a large class of functions. It will be conve- 
nient to give this class a name. 


Definition IX.3.2 A continuous complex-valued function f on the space 
of matrices will be said to belong to the class T if it satisfies the following 
two properties: 


(i) f(XY) =f(VX) for all X,Y. 
(ii) |f(X?™)| < f([XX*]™) for all X, and form = 1,2,... 
Exercise IX.3.3 (i) The functions trace and determinant are in T. 


(ii) For every k, 1 < k <n, the function y,(X) = tr A* X is in T. 
(These are the coefficients in the characteristic polynomial of X.) 
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(iii) Let 4;(X) denote the eigenvalues of X arranged so that |\i(X)| 2 
|Ao(X)| > +++ > |An(X)|. Then, forl <k <n, the function f,(X) = 
k 


I] r;(X) isin T, and so is the function |f;,(X)|. [Hint: Use Theorem 
j=l 


1.3.6.] 


k 
(iv) Forl1<k<n, the function g,(X) = S 1A5(X)| is in T. 
j=l 
(uv) Every symmetric gauge function of the numbers |A1(X)|,..-,|An(X)| 
is in T. 


Exercise IX.3.4 (i) If f is any complex valued function on the space of 
matrices that satisfies the condition (ii) in Definition IX.8.2, then f(A) > 0 
if A> 0. In particular, f(e“) > 0 for every Hermitian matriz A. 


(ui) If f satisfies both conditions (i) and (it) in Definition IX.3.2, then 
f(AB) > 0 if A and B are both positive. In particular, f(e4e?) > 0 if A 
and B are Hermitian. 


The principal result about the class T is the following. 


Theorem IX.3.5 Let f be a function in the class T. Then for all matrices 
A, B, we have 


If(e***)| < f(eRe AeRe 8). (IX.15) 
Proof. For each positive integer m, we have for all X,Y 
IFUIXYP")| << f((xyyexyyP"™) 
FUXYY"X"])" ) 
= ((xtxyy*?"). 


Here, the inequality at the first step is a consequence of the property (ii), 
and the equality at the last step is a consequence of the property (i) of 
functions in JT. Repeat this argument to obtain 


IF(XYP")| << f(xtxvy*yp?”"”) 
< f((xexp™™ yy*P"’). 


Now let A, B be any two matrices. Put X = e4/2"” and Y = e8/2™ in 
the above inequality to obtain 


gmt 


gm—-1 


gm 


if (fea! eB/2m 2") < f([ea 2" eA/27 2" [eB/2" @B" 2" 2"), 


Now let m — oo. Then, by the continuity of f and the Lie Product 
Formula, we can conclude from the above inequality that 


\f(eAt)| < f(eLATAD/2 (B+B")/2) _ f(er A >Re By. 
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Corollary IX.3.6 Let f be a function in the class T. Then 


If(e*)| < f(eR* 4), for all A, (IX.16) 


and 


O0< f(e4t®) < flere”), for Hermitian A, B. (IX.17) 
Particularly noteworthy is the following special consequence. 
Theorem IX.3.7 Let A,B be any two Hermitian matrices. Then 
leA*? || < le*e? | (IX.18) 


for every unitarily invariant norm. 


Proof. Use (IX.17) for the special functions in Exercise IX.3.3(iv). This 
gives the majorisation 


N(eAtF) ~,, A(e4e?). 
But A(e4t®) = s(e4t+) and d(e4e®) <,, s(e“e?). Hence 
s(e4*®) 2, s(e4e?). 


This proves the theorem. a 


Choosing f(X) = trX, we get from (IX.17) the famous Golden- 
Thompson inequality: for Hermitian A, B we have 


tr(e*t) < tr(e4e). (IX.19) 


Exercise I[X.3.8 Let A, B be Hermitian matrices. Show that for every uni- 
tarily invariant norm 


1/t 


| tB tB 
exp “> exp tA exp > 


decreases to |||exp(A + B)||| as t | 0. As a special consequence of this we 
have a stronger version of the Golden-Thompson inequality: 


tB tB 
tr exp(A+ B) < tr (exp > exP tA exp =) forall t>0. 


[Use Theorem IX.2.10 and Exercise [X.1.5.] 


262 IX. A Selection of Matrix Inequalities 
[X.4 Arithmetic-Geometric Mean Inequalities 


The classical arithmetic-geometric mean inequality for numbers says that 
Vab < 5(a + 6) for all positive numbers a,b. From this we see that for 
complex numbers a,b we have |ab| < 4(|a|? + |b|?). In this section we 
obtain some matrix versions of this inequality. Several corollaries of these 


inequalities are derived in this and later sections. 


Lemma IX.4.1 Let Y;, Yo be any two positive matrices, and let Y = Y; — 
Y2. Let Y = Y*-—Y~ be the Jordan decomposition of the Hermitian matrix 
Y. Then, for 7 =1,2,...,n, 


A(Y¥T) SAM), AW(VY~) < Az (¥2). 
(See Section IV.3 for the definition of the Jordan decomposition. ) 


Proof. Suppose A;(Y) is nonnegative for 7 = 1,...,p and negative for 
j=ptl,...,n. Then \;(Y*) is equal to \,;(Y) if 7 =1,...,p, and is zero 
forj7=pt+l,...,n. 

Since Yj = Y + Yo > Y, we have A,;(Y1) > A;(Y) for all 7, by Weyl’s 
Monotonicity Principle. Hence, \;(Yi) > A;(Y7) for all 7. 

Since Yo = Yi —~Y = —Y, we have A; (Y2) = A;(—Y) for all q. But 
Aj(-Y) = Aj(Y~) for 7 = 1,...,n —p and 4;(-Y) = 0 for j > n—p. 
Hence, A;(Y2) > A;(Y~) for all 7. | 


Theorem I[X.4.2 Let A, B be any two matrices. Then 

s;(A*B) < 58i(AA" + BB") (IX.20) 
forl<j<n. 
Proof. Let X be the 2n x 2n matrix X = (4 *). Then 


P AA*+BB* 0 A*A A*B 
( 0 0 ) ( B*A B*B ) 


The off-diagonal part of X*X can be written as 


0 <A*B le P P 
v=( pe 0 ) = 00x — 00x00}, 
where U is the unitary matrix (5 °). Note that both of the matrices in 
the braces above are positive. Hence, by the preceding lemma, 


AS(Y*) S SA(X*X), A(Y7) < 


NO] ee 
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But X*X and X X* have the same eigenvalues. Hence, both r;(YT) and 
A;(Y~) are bounded above by 5;(AA*+BB*). Now note that, by Exercise 


IJ.1.15, the eigenvalues of Y are the singular values of A*B together with 
their negatives. Hence we have 


1 
s;(A*B) < 5 8i(AA® + BB*). 
| 


Corollary IX.4.3 Let A,B be any two matrices. Then there exists a uni- 
tary matrix U such that 


1 
|A"B| < 5U(AA* + BB*)U". (IX.21) 


Corollary IX.4.4 Let A,B be any two matrices. Then 


* 1 * * 
|A BI < S||AA* + BB" || (IX.22) 
for every unitarily invariant norm. 


The particular position of the stars in (IX.20), (IX.21), and (IX.22) is 
not an accident. If we have 


1 1 0 0 
t=(o 0) 8°11): 
then s;(AB) = V2, but $s;(AA* + BB*) = 1. 
The presence of the unitary U in ([X.21) is also essential: it cannot be 


replaced by the identity matrix even when A,B are Hermitian. This is 
illustrated by the example 


1 1 
ap-(3 2). 


A considerable strengthening of the inequality (IX.22) is given in the 
theorem below. 


Theorem IX.4.5 For any three matrices A,B, X, we have 
1 * 
|| A*X Bll < 5 llAATx + X BB" || (IX.23) 
for every unitarily invariant norm. 


Proof. First consider the special case when A,B, X are Hermitian and 
A= B. Then AXA is also Hermitian. So, by Proposition [X.1.2, 


| AX All] < ||Re(XA*) || = = la2x + XA*|l, (IX.24) 
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which is just the desired inequality in this special case. 


Next consider the more general situation, when A and B are Hermitian 
and X is any matrix. Let 


A 0 _ 0 Xx 
t=(9 b) ¥=(e 0), 
Then, by the special case considered above, 


1 
ITY II < SIT?Y +77. (IX.25) 


Multiplying out the block-matrices, one sees that 


0 AXB 
yf = ( axe 0 ) 
0 A2X + XB? 
2 2 _ 
BY+rr = ( pax. xe 0 ). 


Hence, we obtain from (IX.25) the inequality 


AX BI] < SIA?X + XB? (IX.26) 


Finally, let A,B, X be any matrices. Let A= A,U, B= B,V be polar 
decompositions of A and B. Then 


AA*X + XBB* = A?X + XB?, 
while 
| A" X Bll] = |]!UA1X Bi Vl = ||41X By}. 
So, the theorem follows from the inequality (IX.26). = 


Exercise IX.4.6 Another proof of the theorem can be obtained as follows. 


First prove the inequality (IX.24) for Hermitian A and X. Then, for arbi- 
trary A,B and X, let T and Y be the matrices 


0 0 A O 0 xX 00 
_{ 0 0 0 B _{ X* 0 0 0 

“=l) ae 0 0 0]? Y=] 0 0 0 0 |? 
0 BY 0 0 0 000 


and apply the special case to them. 


Exercise [X.4.7 Construct an example to show that the enequality (IX.20) 
cannot be strengthened in the way that (IX.23) strengthens (IX.22). 


When A, B are both positive, we can prove a result stronger than (IX.26). 
This is the next theorem. 
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Theorem IX.4.8 Let A,B be positive matrices and let X be any matrix. 
Then, for each unitarily invariant norm, the function 


f(t) =|] XB t+ Altx Bitty (IX.27) 


ts convex on the interval |—1,1] and attains its minimum at t = 0. 


Proof. Without loss of generality, we may assume that A > 0 and B > 0. 

Since f is continuous and f(t) = f(—t), both conclusions will follow if we 

show that f(t) < $[f(t+s) + f(t—s)], whenever t+ are in [—1, 1] 
For each f, let M: be the mapping 


M,(Y) = (AYR + AtY BY), 
For each Y, we have 
IV ll = A AY) Bt < Silty B~* + A-ty Bey 
by Theorem IX.4.5. Thus |||¥'|l| < |||M:(Y)|||. From this it follows that 


| Mz(AX B) ||| < | MsM:z(AXB)|l, for all s,t. 
But, 


1 
MM: = 5g Mi+s + Miz-_s). 
So we have 


1 
l|Me(AXB)|l| < 5 {lll Mi+s(AXB)|l| + | Mz-s(AX B)|||}- 
Since ||M;(AX B)|| = 5 f(t), this shows that 


f(t) < SIf(t +s) + F(t—s)} 


This proves the theorem. a 


Corollary IX.4.9 Let A, B be positive matrices and X any matriz. Then, 
for each unitarily invariant norm, the function 


g(v) = |||AYX B'” + Al’ XB" || (IX.28) 


is conver on (0, 1]. 
Proof. Replace A,B by A’/?, B1/? in (IX.27). Then puty = 442. m 


Corollary [X.4.10 Let A, B be positive matrices and let X be any matrix. 
Then, forO<v <1 and for every unitarily invariant norm, 
JAXX BI7Y + AY XB" I] < ||AX + XBil. (IX.29) 


Proof. Let g(v) be the function defined in (IX.28). Note that g(1) = g(0). 
So the assertion follows from the convexity of g. a 
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IX.5 Schwarz Inequalities 


In this section we shall prove some inequalities that can be considered to 
be matrix versions of the Cauchy-Schwarz inequality. 

Some inequalities of this kind have already been proved in Chapter 4. 
Let A,B be any two matrices and r any positive real number. Then, we 
saw that 

I |A* BI"? < CAA)" I CBB") Il (IX.30) 


for every unitarily invariant norm. The choice r = 5 gives the inequality 


I AT BPP IP < AT MBI, (IX.31) 


while the choice r = 1 gives 
|A* BUI? < | AA*|| BB" II. (IX.32) 


See Exercise [V.2.7 and Problem IV.5.7. It was noted there that the in- 
equality (IX.32) is included in (IX.31). 

We will now obtain more general versions of these in the same spirit as of 
Theorem IX.4.5. The generalisation of (IX.32) is proved easily and is given 
first, even though this is subsumed in the theorem that follows it. 


Theorem I[X.5.1 Let A,B, X be any three matrices. Then, for every uni- 
tarily invariant norm, 


| A" X BII° < || AA*X|]| |X BB* I. (IX.33) 


Proof. First assume that X is a positive matrix. Then 
A" X BUI? = ATX? XM? BI? = |\(X1/?.A)* (X42 B) II}? 
<S ||XVPAAXMA XT? BBX? I, 
using the inequality (IX.32). Now use Proposition IX.1.1 to conclude that 
|A*X Bll? < ||AA*X|| |X BB". 


This proves the theorem in this special case. Now let X be any matrix, and 
let X = UP be its polar decomposition. Then, by unitary invariance, 


|A°X BI = ||A*UPB}\ = ||U*A*U PB\l, 
AAT X|| = || AA*U Pll = ||U*AA*U PIj, 
|XBB"|| = ||UPBB*|| = |\|PBB"|. 


So, the general theorem follows by applying the special case to the triple 
U* AU, B,P. = 


The corresponding generalisation of the inequality (IX.30) is proved in 
the next theorem. 
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Theorem IX.5.2 Let A, B, X be any three matrices. Then, for every pos- 
tteve real number r, and for every unitarily invariant norm, we have 


| |A*X BI"? < | [AA*X || I] |X BB*|" II. (IX.34) 
Proof. Let X = UP be the polar decomposition of X. Then A*XB = 
A*U PB = (P'/2U* A)* P/?B. So, from the inequality (IX.30), we have 
| |A*XBI"IP < ||P?" AA*UPY?)"|I ||P}? BB* PY?) ||, (1X.35) 
Now note that 
A" (P?U* AA*U Pl?) = )"(AA*U PU*). 
Using Theorem IX.2.9, we have 
\"(AA*U PU*) <y X"/?([AA*]}?[U PU*}?). 
But (UPU*)? = UP?U* = X X*. Hence, 
N"!/?((AA*]?[U PU*]?) = s"(AA*X). 
Thus, 
\"(P\/?U* AA*U P!/?) ~,, 5"(AA*X), 
and hence 
|(P°?U* AAUP?) II < || |AA*X] "II. (IX.36) 
In the same way, we have 
\"(P'/? BB* Pi?) = )"(PBB*) =, X"/?(P?[BB*]?) 
s"(PBB"*) = s"(X BB*). 
Hence 
I|(P’/° BB* P*/?)"||| < ||| |X BB*|" |}. (IX.37) 
Combining the inequalities (IX.35), ([X.36), and (IX.37) we get (IX.34).  ™ 
The following corollary of Theorem IX.5.1 should be compared with 
(IX.29). 


Corollary [X.5.3 Let A, B be positive matrices and let X be any matrix. 
Then, for0Q <v <1, and for every unitarily invariant norm 


JAX BI" | < PAX’ TAX BIE TY. (IX.38) 


Proof. For v = 0,1, the inequality (IX.38) is a trivial statement. For 
y= 7 it reduces to the inequality (IX.33). We will prove it, by induction, 
for all indices vy = k/2”, k =0,1,...,2”. The general case then follows by 
continuity. 
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Let v = ne be any dyadic rational. Then vy = w+ p, where p = 
ofa Tr, p= on Suppose that the inequality (IX.38) is valid for all dyadic 
rationals with denominator 2”—!. Two such rationals are pp and A = u+2p = 
vy +p. Then, using the inequality (IX.33) and this induction hypothesis, we 
have 


JAYXB MY || = Antex B+) 

= ||Ar(A"xBI)B>|| 

< |JA%AHX BIA? LAH x BI BPO |/? 
AX BIA AR X Be? 
AXP? |XBYO-Y? Ax ye? LX Bylo? 
AX OT)? |X BIp-O+)? 
AXP IX BIPM”, 


lA 


This proves that the desired inequality holds for all dyadic rationals. a 


Corollary IX.5.4 Let A, B be positive matrices and let X be any matriz. 
Then, forO <v <1, and for every unitarily invariant norm 


| AY XB" |] < [XP "AX BIN. (IX.39) 


Proof. Assume without loss of generality that A is invertible; the general 
case follows from this by continuity. We have, using (IX.38), 


|A"X BY | = |\(A7*) VAX BO) 
< ||AAX |’ AX BIN” 
= [|X| " AX BI’. 
a 
Note that the inequality (IX.5) is a very special case of (IX.39). 
Exercise IX.5.5 Since |||AA*||| = |||A*Al||, the stars in the inequality 


(IX.32) could have been placed differently. Much less freedom is allowed for 
the generalisation (IX.33). Find a 2 x 2 example in which |\|A*X B\||? is 
larger than |||A* AX'|| ||X BB*||. 


Apart from norms, there are other interesting functions for which Schwarz- 
like inequalities can be obtained. This is done below. It is convenient to have 
a name for the class of functions we shall study. 


Definition IX.5.6 A continuous complez-valued function f on the space 


of matrices will be said to belong to the class L if it satisfies the following 
two conditions: 


() f(B)=f(A)Z>0 Ff B>A>O. 
(ii) |f(A*B)|? < f(A*A)f(B*B) for all A, B. 
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We have seen above that every unitarily invariant norm is a function in 
the class £. Other examples are given below. 


Exercise IX.5.7 (i) The functions trace and determinant are in CL. 


(11) The function spectral radius is in L. 


(iii) If f is a function defined on matrices of order (7) and is in L, then 
the function g(A) = f(A*A) defined on matrices of order n is also in 
L. 


(tv) The functions p,(A) = tr A* A, 1<k <n, are in L. (These are the 
coefficients in the characteristic polynomial of A.) 


(uv) If s;(A), 1 < 3 <n, are the singular values of A, then for each 


l<k<n the function f(A Ts (A) is in L. 


(vi) If X;(A) denote the eigenvalues of A arranged as |A1(A)| > +--+. > 
k 


|An(A)|, then for1<k <n the function f,(A) = I] A;(A) is in L. 
j=1 
Exercise IX.5.8 Another class of functions T was introduced in IX.3.2. 


The two classes T and £L have several elements in common. Find examples 
to show that neither of them is contained in the other. 


A different characterisation of the class £ is obtained below. For this we 
need the following theorem, which is also useful in other contexts. 
Theorem IX.5.9 Let A,B be positive operators on H, and let C be any 
operator on H. Then the operator (2 C ) on H@H is positive if and only 


if there exists a contraction K on H such that C = B\/2 Kk A}/2. 


Proof. By Proposition 1.3.5, K is a contraction if and only if ({ aa ) is 


positive. The positivity of this matrix implies the positivity of the matrix 


All? 0 I K* Al? 0 
( 0 nn )( ie I 0 By 


A Ai/2 k* Bi/2 
7 ( BY? K Al/? B : 


(See Lemma V.1.5.) This proves one of the asserted implications. To prove 
the converse, first note that if A and B are invertible, then the argument 
can be reversed; and then note that the general case can be obtained from 
this by continuity. | 
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Theorem IX.5.10 A (continuous) function f is in the class L if and only 
if it satisfies the following two conditions: 


(a) f(A) >0 for all A> 0. 
(b) |f(C)|? < f(A)f(B) for all A, B,C such that (4 <) is positive. 


Proof. If f satisfies condition (i) in Definition IX.5.6, then it certainly 
satisfies the condition (a) above. Further, if (4 ©) is positive, then, by 
the preceding theorem, C' = B1/2K A!/?, where K is a contraction. So, if 
f satisfies condition (ii) in IX.5.6, then 


If(C)P? =|f(B?KAM?)? < f(B) f(A? K* KA). 


Since A!/2K*K A!/2 < A, we also have f(A!/?K*KA?/?) < f(A) from the 
condition (i) in IX.5.6. 
Now suppose f satisfies conditions (a) and (b). Let B > A > 0. Write 


BA\ [{B-A 0 1 A A 
A B) 0) B-A A Al}. 
The first matrix in this sum is obviously positive; the second is also positive 


by Corollary 1.3.3. Thus the sum is also positive. So it follows from (a) and 
(b) that f(A) < f(B). Next note that we can write, for any A and B, 


A*A A*B \_ f[ A* O A B 

B*A B*B }) \ Bt O 0 O }- 
Since the two matrices on the right-hand side are adjoints of each other, 
their product is positive. Hence we have 


|f(A*B)|* < f(A*A) f(B*B). 


This shows that f is in L. a 


This characterisation leads to an easy proof of the following theorem of 
E.H. Lieb. 


Theorem IX.5.11 Let Aj,..., Am and B,,..., Bm be any matrices. Then, 
for every function f in the class L, 


< f (soaza.) f (yom) (IX.40) 
f (y-i4 f (yoiaii | (IX.41) 


IA 
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Proof. For each i = 1,...,m, the matrix (4:4: ae) is positive, be- 
ing the product of (4: 5) and its adjoint. The sum of these matrices is, 


therefore, also positive. So the inequality (IX.40) follows from Theorem 
TX.5.10. 


Each of the matrices (4: 4i)) is also positive; see Corollary 1.3.4. So 
the inequality (IX.41) follows by the same argument. | 


Exercise [X.5.12 For any two matrices A,B, we have tr(|A + Bl) < 
tr(|A| + |B|). Show by an example that this wnequality is not true if tr 
is replaced by det. Show that we have 


[det(|A + B])]? < det(|A| + |B) det(|A*| + |B*}). (IX.42) 


A similar inequality holds for every function in L. 
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Let f(A, B) be a real valued function of two matrix variables. Then, f is 
called jointly concave, if for all0 <a <1, 


f(a@A; + (l—a@)A2, aB, + (1 — @) Bg) > af(A;, Bi) +(1- a) f (Ae, Bo) 


for all Ay, Ao, By, Bo. 
In this section we will prove the following theorem due to E.H. Lieb. The 
importance of the theorem, and its consequences, are explained later. 


Theorem IX.6.1 (Lieb) For each matriz X and each real number 0 < 
t <1, the function 


f(A, B) = tr xX*A'xB'-* 
1s jointly concave on pairs of positive matrices. 


Note that f(A, B) is positive if A, B are positive. 
To prove this theorem we need the following lemma. 


Lemma IX.6.2 Let Ry, R2,51,52,71,T> be positive operators on a 
Hilbert space. Suppose R,; commutes with Rz,S, commutes with So, and 
IT, commutes with To, and 

Ry >S,+T1, Re > S2+Te. (IX.43) 
Then, forO<t <1, 


RiRy ' > SiS; °+T7{T,~. (IX.44) 
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Proof. Let E be the set of all ¢ in [0,1] for which the inequality (IX.44) 
is true. It is clear that E is a closed set and it contains 0 and 1. We will 
first show that & contains the point 5 1 then use this to show that F is a 
convex set. This would prove the lemma. 

Let x,y be any two vectors. Then 


(x, (97/7 55/7 4 in | 


2 2 1/2 
< 13 Is si yl| + ire! | im y| 
2 1/2 1/2 1/2 
< [St?a? + Tp? all?) [99/7 ull? + Wey ll2}? 


by the Cauchy-Schwarz inequality. This last expression can be rewritten as 
I(x, (Si + Ti)2)]"/?[(y, (S2 + Ta)y)]"/?. 


Hence, by the hypothesis, we have 


(a, ($4753? + TyPTS)y)| < [(a, Riz) (y, Roy)”. 
Using this, we see that, for all unit vectors u and v, 
Mu, Ry? (91/2 g1/? 4 Th 27h/2) RV ,)| 
Rv 2 _ 
= (Ryu, (81782? + PTs) Ry") 


[(R Rea, Ryu (R51? y, R,/?v)]*/? 
= 1. 


IA 


This shows that 
Ry? (817.857? + TP TR?) RSI <1. 


Using Proposition IX.1.1, the commutativity of R, and Ro, and the in- 
equality above, we see that 


~1/4 p—1/4, @1/2 _ _ 
PRR (S183? + TOT RS ARR <1, 
This is equivalent to the operator inequality 
Ry /4RTM4( 91/2 gi/? 4 71/27 4/2) p= 1/4 p-1/4 < I. 
From this, it follows that 
gi? gi/? 4 Th 2p? < RY? RY? 


(See Lemma V.1.5.) This shows that the set E contains the point 1/2. 
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Now suppose that u and v are in FE. Then 
RiR,* > stsye4+TeT)-# 
RURZ”’ > Ssysy-’4+77Ti’, 


These two inequalities are exactly of the form (IX.43). Hence, using the 
special case of the lemma that has been proved above, we have, 


(RY Rg") (RY RE”)? 
2 (SP Sa MMO (S7 S.-Y)? + (TETZ YY? (TPT YY? 


Using the hypothesis about commutativity, we see from this inequality that 
$(4+v) is in E. This shows that E is convex. = 


Proof of Theorem IX.6.1: Let A,B be positive operators on H. Let A 
and B be the left and right multiplication operators on the space L(H) 
induced by A and B; i.e., A(X) = AX and B(X) = XB. Using the results 
of Exercise I.4.4 and Problem VII.6.10, one sees that A and B are positive 
operators on (the Hilbert space) L(H). 


Now, suppose A), Az, B,, Bg are positive operators on H. Let A = A; + 
A2, B = By + Bo. Let Ai, Ao, A denote the left multiplication operators 
on L(H) induced by Ai, Ag, and A, respectively, and B,, B2,B the right 
multiplication operators induced by B,, Bz, and B, respectively. Then A = 
A; + A2, B = B, + Bo. Hence, by Lemma IX.6.2., 

A' Bt > A‘ Bi * + A’ B,*, 
for 0 <t <1. This is the same as saying that for every X in L(H) 
(X, AX B'*) > (X, AEX BY '+ ASX Bi*), 
or that 
tr X*A'X Bt > tr X*AEX BY '4+tr X*ALX BI. 


From this, it follows that 


A,+Ag Bi, +B 1 1 
p(SE Et SEE) > f(A, Br) + 5 f(As, Ba). 
2 2 2 2 
This shows that f is concave. a 


Another proof of this theorem is outlined in Problem [X.8.17. 
Using the identification of £(H) with H®H, Lieb’s Theorem can be seen 
to be equivalent to the following theorem. 


Theorem IX.6.3 (T. Ando) For each 0 < t < 1, the map (A,B) — 
At @ B'* ts jointly concave on pairs of positive operators on H. 
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Exercise IX.6.4 Let t),t2 be two positive numbers such that t, + tg < 1. 
Show that the map (A, B) > A“ @B® is jointly concave on pairs of positive 
operators on H. [Hint: The map A — A* is monotone and concave on 
positive operators for 0 <s <1. See Chapter 5./ 


Lieb proved his theorem in connection with problems connected with 
entropy in quantum mechanics. This is explained below (with some simpli- 
fications). 

The function $(A) = —tr Alog A, where A is a positive matrix, is called 
the entropy function. This is a concave function on the set of positive 
operators. In fact, we have seen that the function f(t) = —t logt is operator 
concave on (0,00). 

Let K be a given Hermitian operator. The entropy of A relative to K 
is defined as 


S(A, K) = : tr [AM2, KP, 


where |X, Y] stands for the Lie bracket (or the commutator) XY—Y X. This 
concept was introduced by Wigner and Yanase, and extended by Dyson, 
who considered the functions 


5,(A, K) = : tr ([At, K][A!*, K]}), (IX.45) 


O<t<l. 


The Wigner-Yanase-Dyson conjecture said that S;(A, K) is concave in 
A on the set of positive matrices. Lieb’s Theorem implies that this is true. 
To see this note that 


S:(A, K) =tr(KA'K A! — K"A). (IX.46) 


Since the function g(A) = —tr K?A is linear in A, it is also concave. 
Hence, concavity of S;(A, K) follows from that of tr KA'K A!-*. But that 
is a special case of Lieb’s Theorem. 

Given any operator X, define 


I,(A, X) = tr(X* AX Alt — X*X A), (IX.47) 


O<t< 1. Note that [9(A,X) =0. When X is Hermitian, this reduces to 
the function S;, defined earlier. Lieb’s Theorem implies that I,(A,X) is a 
concave function of A. Hence, the function I(A,X) defined as 


1(A, X) = s —_Te(A,X) = te(X*(log A)XA~X*X (log A)A) (IX.48) 


0 
is also concave. 


Let A,B be positive matrices. The relative entropy of A and B is 
defined as 


S(A|B) = tr(A(log A — log B)). (IX.49) 


This notion was introduced by Umegaki, and generalised by Araki to the 
von Neumann algebra setting. 
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Theorem IX.6.5 (Lindblad) The function S(A|B) defined above is jointly 
conver in A, B. 


Proof. Consider the block-matrices 
_f{f A O 0 
t=(o 5) (4 


S(A|B) = —I(T, X). 
We noted earlier that I(T, X) is concave in the argument T. = 


Co © 
nN—_ 


Then note that 


Exercise IX.6.6 (i) Show that for every pinching C, S(C(A)|C(B)) < 
S(A|B). 

(11) Let r be the normalised trace function onnxn matrices; t.e., T(A) = 
+ tr A. Show that for all positive matrices A,B 


T(A)(log 7(A) — log 7(B)) < r(A(log A—log B)). (IX.50) 


This 1s called the Peierls-Bogoliubov Inequality. (There are other in- 
equalities that go by the same name.) 


IX.7 Operator Approximation 


An operator approximation problem consists of finding, for a given oper- 
ator A, the element nearest to it from a special class. Some problems of 
this type are studied in this section. In formulating and interpreting these 
results, it is helpful to have an analogy: if arbitrary operators are thought 
of as complex numbers, then Hermitian operators should be thought of as 
real numbers, unitary operators as complex numbers of modulus one and 
positive operators as positive real numbers. Of course, this analogy has its 
limitations, since multiplication of complex numbers is commutative and 
that of operators is not. 

The first theorem below is easy to prove and sets the stage for later 
results. 


Theorem IX.7.1 Let A be any operator and let Re A = $(A+<A*). Then, 
for every Hermitian operator H and for every unitarily invariant norm, 


| A — Re All| < |_A — 4]. (IX.51) 


Proof. Recall that |||7||| = ||/Z*||| for every T. Using this fact and the 
triangle inequality, we have 


[A — 1/2 (A+ A®)ll 1/2 ||A— A*l| = 1/2 ||A—-H+H— A’l 


1/2 (|]A — All + |(A— A") = IA — All. 


IA 
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This proves the theorem. a 


The inequality (IX.51) is sometimes stated in words as: in every unitarily 
invariant norm Re A is a Hermitian approximant to A. 

The next theorem says that a unitary approximant to A is any unitary 
that occurs in its polar decomposition. 


Theorem I[X.7.2 If A=UP, where U is unitary and P positive, then 
|A — UI] < ||A- WI] < |]A+ Ul (IX.52) 


for every unitary W and for every unitarily invariant norm. 


Proof. By unitary invariance, the inequality (IX.52) is equivalent to 
|P — Ill < ||P -U* WI] < ||P +J]I- 


So the assertion of the theorem is equivalent to the following: for every 
positive operator P and unitary operator V, 


IP — Ill < (LP — VII < ||P + JI]. (IX.53) 


This will be proved using the spectral perturbation inequality (IV.62). Let 


= 0 P ~ O V 
re(2E). eH (8 5) 
Then P and V are Hermitian. The eigenvalues of P are the singular values 


of P together with their negatives. (See Exercise II.1.15.) The same is true 


for V, which means that it has eigenvalues 1 and -1, each with multiplicity 
n. We thus have 


Big! (P) — Eig!(V) = (Eig! (P) — I] © [-Eig'(P) + J], 


Rig! (P) — Big'(V) = (Fig! (P) + I] © [-Eig'(P) — J]. 
So, from (IV.62), we have 


Il[Eig'(P) — I] @ [Eig'(P) - T]I|_ < |\(P-V)e(P-vy*Il 
<  |{Eig'(P) + 1] @ [Big!(P) + J}. 
This is equivalent to the pair of inequalities (IX.53). a 


The two approximation problems solved above are subsumed in a more 
general question. Let @ be a closed subset of the complex plane, and let 
N(®) be the collection of all normal operators whose spectrum is contained 
in ®. Given any operator A, what operator in N(®) is closest to A? The 
two theorems proved above answer this when ® is the real line or the unit 
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circle. When ® is the whole plane or the positive half-line, the problem 
becomes much harder, and the full solution is not known. Note that in the 
first case (® = C) we are asking for a normal approximant to A, and in 
the second case (® = R,) for a positive approximant to A. Some results 
on this problem, which are easy to describe and also are directly related to 
other parts of this book, are given below. 

We have already come across a special case of this problem in Chapter 
6. Let £ be a retraction of the plane onto the subset ®; ie., F is a map of 
C onto ® such that |z — F(z)| < |z — w| for all z € C and w € ®. Sucha 


map always exists; it is unique if (and only if) ® is convex. We have the 
following theorem. 


Theorem IX.7.3 Let F be a retraction of the plane onto the closed set ®. 
Suppose ® is convex. Then, for every normal operator A, we have 


| A — F(A)| < || A — NI (IX.54) 


for all N € N(®) and for all unitarily invariant norms. If the set ® is not 
conver, the inequality (IX.54) may not be true for all unitarily invariant 


norms, but is stall true for all Q-norms. (See Theorem VI.6.2 and Problem 
VI.8.13.) 


Exercise [X.7.4 Let A be a Hermitian operator, and let A= At —A~ be 
its Jordan decomposition. (Both At and A™~ are positive operators.) Use 
the above theorem to show that, if P is any positive operator, then 


|A — A* ||| < ||A — Pll (IX.55) 


for every unitarily invariant norm. If A is normal, then for every positive 
operator P 


| A — (Re A)* |] < || A — Pll. (IX.56) 


Theorem IX.7.5 Let A be any operator. Then for every positive operator 
P 


A — (Re A)* Ip < [A - Plo. (IX.57) 
Proof. Recall that ||A||3 = ||Re Al]? + ||Im A||2. Hence, 
||A — (Re A)*|[3 = |[Re A — (Re. A)*|[3 + ||Im All. 


From (IX.55), we see that ||Re A — (Re A)*]|2 is bounded by |/Re A — P\|3. 
This leads to the inequality (IX.57). | | 


The problem of finding positive approximants to an arbitrary operator 
A is much more complex for other norms. See the Notes at the end of the 
chapter. 

For normal approximants, we have a solution in all unitarily invariant 
norms only in the 2 x 2 case. 
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Theorem IX.7.6 Let A be an upper triangular matrix 


Ai Ob S 
= 0. 1X.58 
A=(} oy) be 1X.58) 


Let 0 = arg(A, — Az), and let 
_( ,», 2 IX.59 
No= (_42oy dd? (IX.59) 
Then No is normal, and for any normal matriz N we have 


||A — Nolll < ||A — NI (IX.60) 


for every unitarily invariant norm. 


Proof. It is easy to check that No is normal. Let N = ( T1 #2 ) be 


L3 L4 
any normal matrix. We must have |z2| = |x3| in that case. 
Now note that, if T = i i is any matrix, we can write its off- 
3 la 


diagonal part ( . ° ) as 3(T—UTU*), where U is the diagonal matrix 
3 


with diagonal entries 1 and -1. Hence, for every unitarily invariant norm, 


“me HU MHOn Oo NELC Ih 


Using this, we see that |||A — N||| > ||| diag (b — x2, —x3)|||. But, 
b< |b — rel + |x2| = |b- r2| + |x3| < 2 max(|b — £2|, |x3\). 


Thus the vector $(b,b) is weakly majorised by the vector (lb — zoel, |x3]), 
which, in turn, is weakly majorised by the vector (s;(A — N), s2(A — N)) 
as seen above. Since A — No has singular values (5), +b), this proves the 
inequality (IX.60). = 


Since every 2 x 2 matrix is unitarily equivalent to an upper triangular 
matrix of the form (IX.58), this theorem tells us how to find a normal 
matrix closest to it. 


Exercise IX.7.7 The measure of nonnormality of a matriz, with respect to 
any norm, was defined in Problems VIII.6.4 and VIII.6.5. Theorem IX.7. 6, 
on the other hand, gives for 2 x 2 matrices a formula for the distance to the 
set of all normal matrices. What is the relation between these two numbers 
for a given unitarily invariant norm? 
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IX.8 Problems 


Problem IX.8.1. Let A, B be positive matrices, and let m,k be positive 
integers with m > k. Use the inequality (IX.13) to show that 


tr(A*B*)™ < tr(A™B™)*. (IX.61) 


The special case, 
tr(AB)™ < tr A" B™, (IX.62) 
is called the Lieb-Thirring inequality. 


Problem IX.8.2. Let A,B be Hermitian matrices. Show that for every 
positive integer m 


(i) |tr(AB)?™| < tr A?” B2™, 
(ii) |tr(A™B™)2| < tr A?" B2™, 
(iii) |tr(AB)*™| < tr(A?" B2™)?, 


(Hint: By the Weyl Majorant Theorem |tr X™| < tr|X|™, for every matrix 


X.) Note that if 
1 1 -1 1 
a=(7 4) a-(4 a) 


then |tr(AB)*| = 5,|tr A?B?| = 4, tr(AB)® = 9, and tr(A®B°)? = 0. 
This shows the failure of possible extensions of the inequalities (i) and (iii) 
above. 


Problem IX.8.3. Are there any natural generalisations of the above in- 
equalities when three matrices A, B,C’ are involved? Take, for instance, the 
inequality (IX.62). A product of three positive matrices need not have posi- 
tive eigenvalues. One still might wonder whether |tr(ABC)?| < 
|tr A? B?C?|. Construct an example to show that this need not be true. 


Problem IX.8.4. A possible generalisation of the Golden-Thompson in- 
equality (IX.19) would have been tr(e4*+?t°) < |tr(e4e%e°)| for any three 
Hermitian matrices A, B,C’. This is false. To see this, let 5S, 52,53 be the 
Pauli spin matrices 


01 _({0 -i _({1 0 
s=(5 >) s=(§ 3): s=( 4 *). 


If a, @2,a3 are any real numbers and a = (a? + a2 + a32)'/?, show that 


sinh a 


exp(a,;5;) = (cosh a) + 


da; S;. 
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Let 
A=tS;, B=tS2, C=t(S3 — S2—S}). 
Show that 
tr(e4t®*+C) = 2 cosht, 
jtr(e*eFe?)| = 2 cosh t[1 — r + O(t°)]. 


For small ¢, the first quantity is bigger than the second. 


Problem IX.8.5. Show that the Lie Product Formula has a generalisation: 
for any k matrices A), A2,..., Ax, 


A A A, \™ 
lim (cop exp ~? ..-exp =) = exp(A; + Ag +---+ Akg). 
m m m 


TNL— OO 


Problem IX.8.6. Show that for any two matrices A, B we have 
|A"B + B* All < ||AA* + BB*|| 


and 


|A"B + BY All| < ||A*A + B* Bll 


for every unitarily invariant norm. 


Problem IX.8.7. Let X,Y be positive. Show that for every unitarily in- 


variant norm 
X O 
x-rils|I(9 Il. 


From this, it follows that, for every A, 
AA — AA*|| < |All’, 
and 
|A"A— AA*|, <2" AI2,, 1 <p <oo. 
Problem IX.8.8. Let A, B be positive matrices and let X be any matrix. 
Show that for all unitarily invariant norms, and forO <vy< 1, 
| AY X BOY — AY’ XB" ||| < |2v — 1] ||AX — XBIf. 


Problem IX.8.9. Let A, B be positive operators and let T be any operator 


such that ||T*2x|| < ||Az|| and ||Tz|| < ||Bz|] for all zc. Show that, for all 
x,y and forO<vy <1, 


(x, Ty)| < ||A'~’2|| ||BY yl. 


IX.8 Problems 281 


[Hint: From the hypotheses, it follows that A~!T and TB-! are contrac- 
tions. The inequality (IX.38) then implies that (A~!)!~”T(B-1)” is a con- 
traction.| 


Problem IX.8.10. Use the result of the above problem to prove the fol- 
lowing. For all operators T, vectors x,y, and forO<v< 1, 


a, Ty)|" < (a, |T*POC™M a) (y, | |?”y). 
This inequality is called the Mixed Schwarz Inequality. 
Problem IX.8.11. Show that if A,B are positive matrices, then we have 
det(I + A+ B) < det(I + A)det(I + B). 
Then use this and Theorem IX.5.11 to show that, for any two matrices 
“ ldet(I + A+ B)| < det(I + |Al)det(I + |B). 
(See Problem IV.5.9 for another proof of this.) 


Problem IX.8.12. Show that for all positive matrices A,B 


tr(A(log A —log B)) > tr(A — B). (IX.63) 
1 0 loé 
The example A = 0 2 B= shows that we may not have 


2 
the operator inequality A(log A — log B) > (A — B). 


Problem IX.8.13. Let f be a convex function on an interval J. Let A,B 
be two Hermitian matrices whose spectra are contained in I. Show that 


tr[f(A) — f(B)] > tr[(A — B)f’(B)). (IX.64) 
The special choice f(t) = tlogt gives the inequality (IX.63). 


Problem IX.8.14. Let A be a Hermitian matrix and f any convex func- 
tion. Then for every unit vector x 


f((x, Ax)) < (a, f(A)z). 


This implies that, for any orthonormal basis 21,..., 2p, 


Sf ((xj,Ax;)) < tr f(A). 


The name Peierls-Bogoliubov inequality is sometimes used for this inequal- 
ity, or for its special cases f(t) =e’, f(t) =e, etc. 
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Problem IX.8.15. The concavity assertion in Exercise IX.6.4. can be gen- 
eralised to several variables. Let t;,t2,t3 be positive numbers such that 
ti tte +ts <1. Let A;, Ao, A3 be positive operators. Note that 


A? @ A? @ AB = (AP @ A? @I\(I@1®@ A). 


Use the concavity of the first factor above (which has been proved in Exer- 
cise [X.6.4) and the integral representation (V.4) for the second factor to 
prove that the map (Aj, Az, A3) > A? @ A? @ A® is jointly concave on 
triples of positive operators. More generally, prove that for positive num- 
bers t1,...,¢, witht]; +---+t, <1, the map that takes a k-tuple of positive 
operators (A;,...,A,) to the operator Aj! @---@ A}* is jointly concave. 


Problem IX.8.16. A special consequence of the above is that the map 
A — @*A!/* is concave on positive operators for all k = 1,2,... Use this 
to prove the following inequalities for n x n positive matrices A, B: 


(i) @*(A+B)/* 
(ii) A*(A+ B)i/* 
(iii) Vv*(A+ B)/* 
(iv) det(A+B)1/" 
(v) per(A+B)\/" 
(vi) cx((A + B)'/*) 


QrAVE + @k BUR, 
AR AWE 4 Ak BUR. 
VEAUE 4 yk BUR 
det AV" + det BI”, 
per Al/” + per Bl/”, 
cx(AM*) + e4(BY), 


IV IV IV IV IV IV 


where c,(A) = tr A* (A) for 1 <k <n. 
The inequality (iv) above is called the Minkowski Determinant Theorem 
and has been proved earlier (Corollary II.3.21). 


Problem IX.8.17. Outlined below is another proof of the Lieb Concav- 
ity Theorem which uses results on operator concave functions proved in 
Chapter 5. 

(i) Consider the space £(H) © L(H) with the inner product 


(fi, R2), (Si, S2)) = tr( RPS + R5S2). 


(ii) Let A,, Ag be invertible positive operators on H and let A = 
1/2 (A, + Ag). Let 


A(R) = ARA™, 
Aio(R,S) = (AiRAj', AgRA5Z?). 


Then A is a positive operator on the Hilbert space £(H) and Aj» is 
a positive operator on the Hilbert space L(H) 6 L(H) 


(iii) Note that for any X in £(H) 


tr X*A*XAl* = (XA? At X AM?)) 
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and 


tr(X* AXA, * + X* AEX Al) 
= ((XA/?, XA”), At(XAM?, xAl/?)), 


(iv) Let V be the map from L(H) into L(H) L(H) defined as 
1 1/2 
V(X Al?) — Fal X A , XAs/”). 


Show that V is an isometry. Show that 


A= V*ApV. 


(v) Since the function f(t) on (0,00) is operator concave for 0 <t< 1, 
using Exercise V.2.4, we obtain 


V*At LV < (V*Aa2V) = At, 
(vi) This shows that 
tr X*A'X At! > 1/2 tr(X*ALX Al? + X* AEX Al) 


when A; and Ap are invertible. By continuity, this is true for all 
positive operators A; and Ag. In other words, for all 0 < t < 1, the 
function 

f(A) =trX*A'X Al 


is concave. 


(vii) Use 2 x 2 operator matrices (4 n) and (2 »,) to complete the proof 
of Lieb’s concavity theorem. 


Problem [X.8.18. Theorem IX.7.1 can be generalised as follows. Let y 
be a mapping of the space of n x n matrices into itself that satisfies three 
conditions: 


(i) wy? is the identity map; i-e., p(y(A)) = A for all A. 


(ii) y is real linear; i.e., p(aA + GB) = ay(A) + By(B) for all A, B and 
all real a, (. 


(iii) A and y(A) have the same singular values for all A. 


Then the set I(y) = {A : y(A) = A} is a real linear subspace of the 
space of matrices. For each A, the matrix $(A + y(A)) is in I(p), and 
for all unitarily invariant norms |||A — $(A + y(A))||| < |||A — Bll for all 
Be I(p). 
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Examples of such maps are y(A) = £A*, p(A) = +A? , and y(A) = +A, 
where A? denotes the transpose of A and A denotes the matrix obtained 
by taking the complex conjugate of each entry of A. 


Problem IX.8.19. The Cayley transform of a Hermitian matrix A is 
the unitary matrix C(A) defined as 


C(A) =(A-il)(A+il)™. 
If A, B are two Hermitian matrices, we have 
= (CtA) — O(B)] =(B+il) (A BY(A +i 
Use this to show that for all 7, 
1/2 s;(C(A) — C(B)) < 8;(A — B). 
[Note that ||(A+7I)~*|| < 1 and ||(B+iI)—*|| < 1.] In particular, this gives 
1/2 ||C(A) — C(B)IIl < |I|A — Bll 


for every unitarily invariant norm. 


Problem IX.8.20. A 2 x 2 block matrix A = (42 42 ), in which the four 
matrices A;; are normal and commute with each other, is called binormal. 
Show that such a matrix is unitarily equivalent to a matrix A = (4 Ap ), 
in which A,, Ag, B are diagonal matrices and B is positive. Let 


Ay iB 
No = 2 
0 ( 5 U2 B Ao ) ’ 
where U is the unitary operator such that Ay — Ag = U|A; — Ag|. Show 
that in every unitarily invariant norm we have 


IA — Noll < IA — NII 


for all 2n x 2n normal matrices N. 


Problem [X.8.21. An alternate proof of the inequality (IX.55) is out- 
lined below. Choose an orthonormal basis in which A is diagonal and A = 

+ 
(“ —%_). In this basis let P have the block decomposition P = (2 77). 


. . . . P21 Pa 
By the pinching inequality, 
At — Py, 0 
0 —A7~ — Poo 


Since both A~ and P 2 are positive, |||A7||] < |||A7 + Pogl||. Use this to 
prove the inequality (IX.55). 

This argument can be modified to give another proof of (IX.56) also. 
For this we need the following fact. Let T and S be operators such that 
0 < Ref < ReS, and ImT = ImS. If T is normal, then |||T|j] < |||S'|\), 
for every unitarily invariant norm. Prove this using the Fan Dominance 
Theorem (Theorem IV.2.2) and the result of Problem III.6.6. 


[A — Pll = 
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Kelly, Eigenvalue inequalities for products of matriz exponentials, Linear 
Algebra Appl., 45(1982) 55-95, in D. Petz, A survey of certain trace in- 
equalities, Banach Centre Publications 30(1994) 287-298, and in Chapter 6 
of Horn and Johnson, Topics in Matrix Analysis. 

Theorem IX.4.2 was proved in R. Bhatia and F. Kittaneh, On the 
singular values of a product of operators, SIAM J. Matrix Analysis, 11(1990) 
272-277. The generalisation given in Theorem [X.4.5 is due to R. Bhatia 
and C. Davis, More matrix forms of the arithmetic-geometric mean in- 
equality, SIAM J. Matrix Analysis, 14(1993) 132-136. Many of the other 
results in Section IX.4 are from these two papers. The proof outlined in 
Exercise IX.4.6 is due to F. Kittaneh, A note on the arithmetic-geometric 
mean inequality for matrices, Linear Algbera Appl., 171(1992) 1-8. A gen- 
eralisation of the inequality (IX.21) has been proved by T. Ando, Matrix 
Young inequalities, Operator Theory: Advances and Applications, 75(1995) 
33-38. If p,q > 1 and : ++ z = 1, then the operator inequality |AB*| < 
U(5|A|? + 2|B|?)U* is valid for some unitary U. 

Theorems [X.5.1 and IX.5.2 were proved in R. Bhatia and C. Davis, 
A Cauchy-Schwarz inequality for operators with applications, Linear Al- 
gebra Appl., 223(1995) 119-129. For the case of the operator norm, the 
inequality (IX.38) is due to E. Heinz, as are the inequality (IX.29) and 
the one in Problem IX.8.8. See E. Heinz, Beitrage zur Stérungstheorie der 
Spektralzerlegung, Math. Ann., 123(1951) 415-438. Our approach to these 
inequalities follows the one in the paper by A. McIntosh cited above. The 
inequality in Problem IX.8.9 is also due to E. Heinz. The Mixed Schwarz 
inequality in Problem IX.8.10 was proved by T. Kato, Notes on some in- 
equalities for linear operators, Math. Ann., 125(1952) 208-212. (The papers 
by Heinz, Kato, and McIntosh do much of this for unbounded operators 
in infinite-dimensional spaces.) The class £ in Definition IX.5.6 was in- 
troduced by E.H. Lieb, Inequalities for some operator and matrizx func- 
tions, Advances in Math., 20(1976) 174-178. Theorem IX.5.11 was proved 
in this paper. These functions are also studied in R. Merris and J.A. 
Dias da Silva, Generalized Schur functions, J. Algebra, 35(1975) 442-448. 
B. Simon (Trace Ideals, p. 99) calls them Liebian functions. The character- 
isation in Theorem IX.5.10 has not appeared before; it simplifies the proof 
of Theorem IX.5.11 considerably. 

The Lieb Concavity Theorem was proved by E.H. Lieb, Convex trace 
functions and the Wigner-Yanase-Dyson conjecture, Advances in Math., 
11(1973) 267-288. The proof given here is taken from B. Simon, Trace Ide- 
als. ‘T. Ando, Concavity of certain maps on positive definite matrices and 
applications to Hadamard products, Linear Algebra Appl., 26(1979) 203- 
241, takes a different approach. Using the concept of operator means, he 
first proves Theorem IX.6.3 (and its generalisation in Problem [X.8.15) and 
then deduces Lieb’s Theorem from it. The proof given in Problem IX.8.17 
is taken from D. Petz, Quasi-entropies for finite quantum systems, Rep. 
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Math. Phys., 21(1986) 57-65. Theorem IX.6.5 was proved by G. Lindblad, 
Entropy, information and quantum measurements, Commun. Math. Phys., 
33(1973) 305-322. Our proof is taken from A. Connes and E. Stérmer, 
Entropy for automorphisms of II, von Neumann algebras, Acta Math., 
134(1975) 289-306. The reader would have guessed from the titles of these 
papers that these inequalities are useful in physics. The book Quantum En- 
tropy and Its Use by M. Ohya and D. Petz, Springer-Verlag, 1993, contains 
a very detailed study of such inequalities. Another pertinent reference is 
D. Ruelle, Statistical Mechanics, Benjamin, 1969. The inequalities in Prob- 
lem [X.8.16 are taken from T. Ando, Inequalities for permanents, Hokkaido 
Math. J., 10(1981) 18-36, and R. Bhatia and C. Davis, Concavity of certain 
functions of matrices, Linear and Multilinear Algebra, 17 (1985) 155-164. 
Theorems IX.7.1 and IX.7.2 were proved in K. Fan and A.J. Hoffman, 
Some metric inequalities in the space of matrices, Proc. Amer. Math. Soc., 
6(1955) 111-116. The inequalities in Problem IX.8.19 were also proved in 
this paper. The result in Problem IX.8.18 is due to C.-K. Li and N.-K. 
Tsing, On the unitarily invariant norms and some related results, Lin- 
ear and Multilinear Algebra, 20 (1987) 107-119. Two papers by P.R. Hal- 
mos, Positive approximants of operators, Indiana Univ. Math. J. 21(1972) 
951-960, and Spectral approrimants of normal operators, Proc. Edinburgh 
Math. Soc., 19 (1974) 51-58, made the problem of operator approximation 
popular among operator theorists. The results in Theorem IX.7.3, Exer- 
cise [X.7.4, and in Problem IX.8.21, were proved in these papers for the 
special case of the operator norm (but more generally for Hilbert space 
operators). The first paper of Halmos also tackles the problem of finding a 
positive approximant to an arbitrary operator, in the operator norm. The 
solution is different from the one for the Hilbert-Schmidt norm given in 
Theorem [X.7.5, and the problem is much more complicated. The problem 
of finding the closest normal matrix has been solved completely only in 
the 2 x 2 case. Some properties of the normal approximant and algorithms 
for finding it are given in A. Ruhe, Closest normal matriz finally found! 
BIT, 27 (1987) 585-598. The result in Problem IX.8.20 was proved, in the 
special case of the operator norm, by J. Phillips, Nearest normal approz- 
imation for certain normal operators, Proc. Amer. Math. Soc., 67 (1977) 
236-240. The general result was proved in R. Bhatia, R. Horn, and F. Kit- 
taneh, Normal approximants to binormal operators, Linear Algebra Appl., 
147(1991) 169-179. An excellent survey of matrix approximation problems, 
with many references and applications, can be found in N.J. Higham, Ma- 
tric nearness problems and applications, in the collection Applications of 
Matrix Theory, Oxford University Press, 1989. A particularly striking ap- 
plication of Theorem IX.7.2 has been found in quantum chemistry. Given 
n linearly independent unit vectors e€1,...,€n in an n-dimensional Hilbert 
space, what is the orthonormal basis f1,..., f, that is closest to the e;, in 
the sense that Dlle; — f;||* is minimal? The Gram-Schmidt procedure does 
not lead to such an orthonormal basis. The chemist P.O. Lowdin, On the 
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non-orthogonality problem connected with the use of atomic wave functions 
in the theory of molecules and crystals, J. Chem. Phys., 18(1950) 365-374, 
found a procedure to obtain such a basis. The problem is clearly equivalent 
to that of finding a unitary matrix closest to an invertible matrix, in the 
Hilbert-Schmidt norm. Theorem IX.7.2 solves the problem for all unitar- 
ily invariant norms. The importance of such results is explained in J.A. 
Goldstein and M. Levy, Linear algebra and quantum chemistry, American 
Math. Monthly, 78 (1991) 710-718. 


x 


Perturbation of Matrix Functions 


In earlier chapters we derived several inequalities that describe the variation 
of eigenvalues, eigenvectors, determinants, permanents, and tensor powers 
of a matrix. Similar problems for some other matrix functions are studied 
in this chapter. 


X.1 Operator Monotone Functions 


If a, b are positive real numbers, then it is easy to see that |a”—b"| > |a—b|" 
ifr > 1, and ja” —b"| < ja—b|" if0 <r < 1. The inequalities in this section 
are extensions of these elementary inequalities to positive operators A, B. 
Instead of the power functions f(t) = t", 0 < r < 1, we shall consider the 
more general class of operator monotone functions. 


Theorem X.1.1 Let f be an operator monotone function on |0,00) such 
that f(0) =0. Then for all positive operators A, B, 


F(A) — FCB] < FIA — BI). (X.1) 


Proof. Since f is concave (Theorem V.2.5) and f(0) = 0, we have 
f(a+b) < f(a) + f(b) for all nonnegative numbers a, b. 

Let a = ||A— B||. Then A — B < al. Hence, A < B+ al and f(A) < 
f(B+alI). By the subadditivity property of f mentioned above, therefore, 
< f(B)+ fla)l. Thus f(A) — f(B) < f(a)l and, by symmetry, 
f(A) < f(a)I. This implies that |f(A) — f(B)| < f(a)I. Hence, 
F(A) — FB) < Fla) = F(A — BID. . 
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Note the special consequence of the above theorem: 
JA" - B's |A-Bl", Osrsl (X.2) 


for any two positive operators A,B. Note also that the argument in the 
above proof shows that 


F(A) — FCBYI S FIA — BADIM 


for every unitarily invariant norm. 


Exercise X.1.2 Show that for 2 x 2 positive matrices, the inequality 


\|Al/2 — B1/2||. < ||A— Bll/” is not always valid. (It is false even when 
B=0.) 


The inequality (X.2) can be rewritten in another form: 
JA" — Br <|||A-Bl" |, O<rs<t. (X.3) 


This has a generalisation to all unitarily invariant norms. Once again, for 
this generalisation, it is convenient to consider the more general class of 
operator monotone functions. 


Recall that every operator monotone function f on [0, 00) has an integral 
representation 


ft 
t) = t ——dw(r , 
fi) =y+ 6+ | aw(a), (X.4) 
0 
where y = f(0), 6 > 0 and w is a positive measure such that fo. tax dw(A) 
< oo. (See (V.53).) 
Theorem X.1.3 Let f be an operator monotone function on (0,00) such 


that f(0) = 0. Then for all positive operators A,B and for all unitarily 
invariant norms 


f(A) — FCB) < F(A — BI. (X.5) 


In the proof of the theorem, we will use the following lemma. 


Lemma X.1.4 Let X,Y be positive operators. Then 
WX +I -(X+V 4D 44 <r-v+D7 4 


for every unitarily invariant norm. 


Proof. Since (X+J)~! < I, by Lemma V.1.5 we have Y2(X+7)71yl/2 
< Y. Hence, 


P—(y¥*(xX4Tylvag nl <r (ved. 
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Therefore, by the Weyl Monotonicity Principle, 
| _ _ _ 
ASI — [Y?9(X + TY? 4) < aby 474) 


for all 7. Note that Y!/2 (X +I)~'Y'/2 has the same eigenvalues as those 
of (X +1)-2Y(X +1 )-2. So, the above inequality can also be written as 


ARI [(X+1)7?Y(X4+D-2 4171) <M + 7-4. 


From the identity 


(X+I)7'-(X4+Y4+I)7! 


= (X+1)-2{1-[(X4D-2Y(X4+ D772 4+ OX 4 D732 
and the fact that ||(X + I)~2|| < 1, we see that 
L —1 —1 
AN((X + I}? - [xX +¥ 4 7]-3) 
< M(I-[(X4+D72Y(X+ N72 +777). 


Thus, 
As((X +71 - [xX +¥4+ 77) < Ax (I — [Y + 1]74) 


for all 7. This is more than what we need to prove the lemma. a 


Proof of Theorem X.1.3: By the Fan Dominance Property (Theorem 
IV.2.2) it is sufficient to prove the inequality (X.5) for the special class of 
norms ||: ||(x), k= 1,2,...,n. 

We will first consider the case when A — B is positive. Let C = A — B. 
We want to prove that 


IF(B+C) — f(B)llay SFO): (X.6) 


Let oj = s;(C), 7 =1,2,...,n. Since o,; are the eigenvalues of the positive 
operator C’, we have 


s;(h(C)) = h(o;), y] = 1, -.- 51, 


for every monotonically increasing nonnegative function h(t) on [0, 00). 
Thus 


Cyl) = oH o;), k=1,...,n, 


for all such functions h. 
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Now, f has the representation (X.4) with y = 0. The functions G¢ and 


th are nonnegative, monotonically increasing functions of t. Hence, 
k 
IFO = DF) 
j=l 
k ky. 
= J dw(X 
6 dui + YS X4a; (X) 
j= = 


B\ICllay + f AIC(C + AT) lay dw(A). 


[ 
| 


In the same way, we can obtain from the integral representation of f 


F(B+C) -— f( Bila 
< AllCllay + Pr\B+oylB+co+ry — B(B + AI) ||) dw(). 
0 


Thus, our assertion (X.6) will be proved if we show that for each \ > 0 
I(B+C)\(B+C+XI)* — BB+ ADI | ay < HC(C 4AD My- 


Now note that we can write 


x ~1 

X(X+ AI =I- (5 +1) 
So, the above inequality follows from Lemma X.1.4. This proves the theo- 
rem in the special case when A — B is positive. 

To prove the general case we will use the special case proved above and 
two simple facts. First, if X,Y are Hermitian with positive parts X+,Y+ 
in their respective Jordan decompositions, then the inequality X < Y 
implies that ||X*||(x) < |[¥*|lq) for all k. This is an immediate conse- 
quence of Weyl’s Monotonicity Principle. Second, if X 1, X2, Y1, Yo are pos- 
itive operators such that X;X2 = 0, YiY2 = 0, Xallay < W¥illay, and 
\|Xallc~y < WYellc) for all k, then we have ||X y+ Xallczy < Yi + Yala) for 
all k. This can be easily seen using the fact that since X, and X2 commute 
they can be simultaneously diagonalised, and so can Y1, Yo. 

Now let A,B be any two positive operators. Since A — B < (A — B)*, 
we have A < B+(A— B)*, and hence f(A) < f(B + (A— B)t). From 
this we have 


f(A) ~ f(B) < f(B +(A- B)*) - #(B), 
and, therefore, by the first observation above, 


NFCA) — FCB) Way S WF(B + (A - B)*) — FB) 
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for all k. Then, using the special case of the theorem proved above we can 
conclude that 


ILF(A) — FCB) Way < FCA — BY) lay 
for all k. Interchanging A and B, we have 
NLF(B) — F(A) May SIF LB - A]*) Ila 
for all k. Now note that 
f([A— By") f([B-A]*) = 0, 
f([A— B]") + f([B = A]*) f(A — Bh), 


[f(A) — f(B)I" [f(B) — F(A)]* 0, 
[f(A) — f(B)|” + [f(B) — f(A) = f(A) — £(B). 


Thus, the two inequalities above imply that 


F(A) — F(B)Ilay SWFA - Bile 


for all k. This proves the theorem. a 


Exercise X.1.5 Show that the conclusion of Theorem X.1.3 is valid for 
all nonnegative operator monotone functions on (0, 00); t.e., we can replace 
the condition f(0) = 0 by the condition f(0) > 0. 


One should note two special corollaries of the theorem: we have for all 
positive operators A, B and for all unitarily invariant norms 


A" — Bri < J |A-Bl" |], O<r<t, (X.7) 


| log(Z + A) — log’ + B)Il| < |og(Z + |A — BI)II. (X.8) 


Theorem X.1.6 Let g be a continuous strictly increasing map of {0, 00) 
onto itself. Suppose that the inverse map g™' is operator monotone. Then 
for all positive operators A,B and for all unitarily invariant norms, we 
have 


Ilg(A) — g(B)Il 2 IgA — BI (X.9) 


Proof. Let f = g~!. Since f is operator monotone, it is concave by 
Theorem V.2.5. Hence g is convex. From Theorem X.1.3, with g(A) and 
g(B) in place of A and B, we have 


[A — Bll < II F(g(A) — 9(B)D II. 


This is equivalent to the weak majorisation 


{s;(A — B)} Xw {s;(f(lg(A) — 9(B)|))}- 
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Since f is monotone, 


s3(f(|g(A) — 9(B)I)) = F(ss(9(A) — 9(B))) 


for each 7. So, we have 


{3;(A — B)} <w {f(s3(9(A) — 9(B)))}- 


Since g is convex and monotone, by Corollary II.3.4, we have from this 


{9(sj(A — B))} <w {8;(9(A) — 9(B))}. 


Since g is monotone, this is the same as saying that 


{s;(g|A — B|)} <w {8;(9(A) — 9(B))}, 


and this, in turn, implies the inequality (X.9). | 


Two special corollaries that complement the inequalities (X.7) and (X.8) 


are worthy of note. For all positive operators A,B and for all unitarily 
invariant norms, 


JA" — Bell] > || |A- BI", if r 21, (X.10) 


| exp A — exp BIl| > |l|exp(|A — BI) — J]. (X.11) 


Exercise X.1.7 Derive the inequality (X.10) from (X.7) using Exercise 
IV.2.8. 


Is there an inequality like (X.10) for Hermitian operators A, B? First note 
that if A is Hermitian, only positive integral powers of A are meaningfully 
defined. So, the question is whether |||(A — B)™||| can be bounded above 
by ||A™ — B™|]. No such bound is possible if m is an even integer; for the 


choice B = —A, we have A™ — B™ = 0. For odd integers m, we do have a 
satisfactory answer. 


Theorem X.1.8 Let A,B be Hermitian operators. Then for all unitarily 
wnvariant norms, and form = 1,2,..., 


(A — BYP] < 2? Amst — Bemt yyy, (X.12) 


For the proof we need the following lemma. 


Lemma X.1.9 Let A,B be Hermitian operators, let X be any operator, 
and let m be any positive integer. Then, for 7 = 1,2,...,m 


JA™TI X BM-IF1 _ A™—It1LX BMI | 
< PATTIE XB™ I — Am IX Betty, (X.13) 
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Proof. By the arithmetic-geometric mean inequality proved in Theorem 
IX.4.5, we have 


| AX Bll] < 1/2 ||A°X + XB? 


for all operators X and Hermitian A,B. We will use this to prove the 
inequality (X.13). First consider the case j = 1. We have 


WAT X B™ _ A™X BM! | 
\|A(A™X B™—* — A™-1X B™) Bil 


< 1/2 ||A*(A™XB™ 1 — aml x Br) 
+ (A™X B™-* — A™ 1X B™) BI 
< 1/2 |A@t2 x Be} _ A™—1X B™*?|| 


+ 1/2 ||A"*XB™ — ATX Bm, 
Hence 
||A™ +1 xX B™ _ ATX B™t1| < JA" T2X Bm _ A™ "1X BT 1, 


This shows that the inequality (X.13) is true for 7 = 1. The general case 
is proved by induction. Suppose that the inequality (X.13) has been proved 
for 7 —1 in place of 7. Then using the arithmetic-geometric mean inequality, 
the triangle inequality, and this induction hypothesis, we have 


JA" X BT IF _ A™-I+1X Br | 
|| A(A™ 19-1 Xx B™3 _ A™-IX B™*I-1) Bil 


< 1/2 ||A?(A™*I-LX B™-I — A™-IX B™I-1) 
4 (A™+I-1X B™-I _ A™-3 X B™+I-1) BI 
< 1/2 JA" tI X B™ 3 _ A™ JX BMtItly 
+ 1/2 JAe™tG-D xX Bm—G-IF+1 _ A™-G-D4+1 xy Bmt+G-D II 
< 1/2 AT tIt X B™-I _ A™ IX B™*It1 || 
+ 1/2 JA" XB™-G-Y) _ gm-G-D XxX Bm, 
This proves the desired inequality. a 


Proof of Theorem X.1.8: Using the triangle inequality and a very special 
case of Lemma X.1.9, we have 


42" (A — B) + (A- B)B?™ | | 
< Jaret — Bem 4g ye B — AB 
< 2)JA2+ — B24] (X.14) 


Let C = A—B and choose an orthonormal basis x; such that Cz; = A;2;, 
where |\,| = s;(C) for 7 = 1,2,...,n. Then, by the extremal representation 
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of the Ky Fan norms given in Problem III.6.6, 


IV 


(xj, (A°™C + CB’) z5)| 
( 


IAjl{(2z3, A225) + (xj, B?2;)}. 


k 
j=l 
k 
j=l 


Now use the observation in Problem IX.8.14 and the convexity of the func- 
tion f(t) = t?™ to see that 


(uj, A°™2;) + (xj, B23) Ax, |P" + || Ba ||” 
20" (\|Axy|| + || Ba3||)°" 


2-21 An; — Ba; |? 


Qioem) \ jam. 


IV IV IV 


We thus have 


IV 


k 
S o2t-2m| dy |? 


j=l 
20" (A — BYP gy. 


|A2"C + C B2™ || (k) 


Since this is true for all k, we have 
AP" C + CB?™ || > 23-2 |.4 — BY? 41 | 


for all unitarily invariant norms. Combining this with (X.14) we obtain the 
inequality (X.12). | 
Observe that when B = —A, the two sides of (X.12) are equal. 


X.2. The Absolute Value 


In Section VII.5 we obtained bounds for ||| |A| — |B] ||| in terms of 
||| A — B|||. More such bounds are obtained in this section. Since |A| = 


(A*A)*/2, results for the square root function obtained in the preceding 
section are useful here. 


Theorem X.2.1 Let A, B be any two operators. Then, for every unitarily 
invariant norm, 


| |A| — |B] || < V2( IA + Bill ]A — By)”. (X.15) 
Proof. From the inequality (X.7) we have 


| [Al — |B] |] < | |A*A — B* BY’? I. (X.16) 
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Note that 
A*A — BB =1/2 {(A+ B)*(A— B)+(A-B)*(A+B)}.— (X.17) 
Hence, by Theorem III.5.6, we can find unitaries U and V such that 
|A*A — B*B| < 1/2 {U|(A + B)*(A — B)|U* + V\(A — B)*(A + B)|V*}. 


Since the square root function is operator monotone, this operator inequal- 
ity is preserved if we take square roots on both sides. Since every unitarily 
invariant norm is monotone, this shows that 


| |A"A— B*BI/? ||? << 1/2 |I[U|(A+ B)*(A — B)|U* 
+ V\(A — B)*(A+ B)|V*]?|I?. 
By the result of Problem IV.5.6, we have 
WX + YUP? < QC [X|/2 YI? + HY? UI) 
for all X,Y. Hence, 
| |ATA— B* BI? ||? << | (A+ B)*(A- By II? 
+ ||| |(A — B)*(A + B)|/?]]/?. 


By the Cauchy-Schwarz inequality (IV.44), the right-hand side is bounded 
above by 2||A + Bl || A — Bl]. Thus the inequality (X.15) now follows from 
(X.16). | 


Example X.2.2 Let 
1 O 0 1 
A=( 5 5) B=( 5 5) 


| AJ -|Bl la = 2, |]A+ Bla = ||A- Bl = v2. 
So, for the trace norm the inequality (X.15) is sharp. 


Then 


An improvement of this inequality is possible for special norms. 


Theorem X.2.3 Let A,B be any two operators. Then for every Q-norm 
(and thus, in particular, for every Schatten p-norm with p > 2) we have 


| Al - |B] le < (|A + Bllgl|A — Bila)”. (X.18) 


Proof. By the definition of Q-norms, the inequality (X.18) is equivalent 
to the assertion that 


WA] — [BPI < WA + BPP? I A — BPI? (X.19) 
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for all unitarily invariant norms. From the inequality (X.10) we have 
II(LA| — [B])*II < Il [APP — [Bf] = |] A*4 — Br BI. 
Using the identity (X.17), we see that 
| A*A — BY Bll] < 1/2 {||(A + B)*(A — B)Il| + |I(A— B)"(A + BDI. 


Now, using the Cauchy-Schwarz inequality given in Problem IV.5.7, we see 
that each of the two terms in the brackets above is dominated by 


| |A+ BP IP/7 ||| |A — BP |P?. 
This proves the inequality (X.19). a 


Theorem X.2.4 Let A,B be any two operators. Then, for all Schatten 
p-norms with 1 < p< 2, 


|| |A] — | Bl |lp < 2772 ([A+ Bl|p||A — Bllp)*/?. (X.20) 
Proof. Let , 
|Xlp == (© s®(X))/?, forall p>0. 


When p > 1, these are the Schatten p-norms. When 0 < p < 1, this defines 
a quasi-norm. Instead of the triangle inequality, we have 


|X +¥|lp <2”? (|X|, + [|¥llp), O<p<l. (X.21) 


(See Problems IV.5.1 and IV.5.6.) Note that for all positive real numbers 
r and p, we have 


IAT lp = IX lr p- (X.22) 


Thus, the inequality (X.7), restricted to the p-norms, gives for all positive 
operators A, B 


JA” — B'||p < ||A- Bl, for 0<r<1l,l<p<wo. (X.23) 


Hence, for any two operators A, B, 


| 1A] —|Bl llp < |A"A- B*BILES, 1 <p soo. 


Now use the identity (X.17) and the property (X.21) to see that, for 
l<p<2, 


|A*A— B*Bl|pj2 < 2°/?-*{||(A+ B)*(A~B)|lpj2 + ||(A— B)*(A+B) pa}. 


From the relation (X.22) and the Cauchy-Schwarz inequality (IV .44), it fol- 
lows that each of the two terms in the brackets is dominated by 
|A + Blp||A — Bl|p. Hence 


|A"A — B*Bllpja < 2°/P "A + Bilp|A — Bllp 


X.2 The Absolute Value 299 


for 1.< p< 2. This proves the theorem. a 


The example given in X.2.2 shows that, for each 1 < p < 2, the inequality 
(X.20) is sharp. 


In Section VII.5 we saw that 
|| |A] — |B] lle < V2 ||A— Bllo (X.24) 


for any two operators A, B. Further, if both A and B are normal, the factor 
V2 can be replaced by 1. Can one prove a similar inequality for the operator 
norm instead of the Hilbert-Schmidt norm? Of course, we have from (X.24) 


| |A] — |B] || < v2n ||A— By (X.25) 


for all operators A,B on an n-dimensional Hilbert space 7. It is known 
that the factor /2n in the above inequality can be replaced by a factor 
Cn = O(log n); and even when both A, B are Hermitian, such a factor is 
necessary. (See the Notes at the end of the chapter.) 

In a slightly different vein, we have the following theorem. 


Theorem X.2.5 (T. Kato) For any two operators A,B we have 


2 |All + |] BI 
A| —|B < SI4— Bll (2+ 10g MUI , (X.26) 
| Al — |B] Is =| | )A—B] 
Proof. The square root function has an integral representation (V.4); this 
says that 
t/? — -| —t y-V2qy, 
1 A+t 
0 


We can rewrite this as 
p22 [ow — A720) 4 t) dd. 
1 
0 
Using this, we have 
|A| —|B| = i [ruse + A)7* — (Al? +A)7 dX. (X.27) 
1 
0 


We will estimate the norm of this integral by splitting it into three parts. 
Let 


a= ||A—Bll*, 6 = (|All + Bll)”. 
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Now note that if X,Y are two positive operators, then -Y < X —Y < X, 
and hence ||X — Y|| < max(||X |], |/Y ||). Using this we see that 


| [ (BP +a) = (AP +a) "aa 


< fora = 201/2 = 2||A — BI]. (X.28) 
0 


From the identity 
(|BJ? +A)~* — (JA)? +A)? = (IB? +)" *((AP? —|BP?)(JA)? 4)7* (X.29) 


and the identity (X.17), we see that 


WB +A)-* — (JAP +A)" < A PA+ Bl |A- Bl 
< 767A — Bl. 
Hence, 
| peru + A)~" = (JA]? +A)7*]a Al] < 2\.4 — BI. (X.30) 
B 


Since A*A — B*B = B*(A— B) + (A* — B*)A, from (X.29) we have 


(\BI? +)"? — (JAP? +.) 
= (|B? +.)"2B*(A — B)(|A2 +)? 


+ (|B)? + A)7*(A* — B*)A(JA)? 4.A)7?, (X.31) 
Note that 
WB +A)" B* = | BBP +A)7}| 
= | IBIIBP +s sa, 


since the maximum value of the function f(t) = Fax is SUITE: By the same 
argument, 
1 
AVA 2 —l < 
IAAP +) so. 


So, from (X.31) we obtain. 


MIBI + )~* — (JAP? +A)“ < A874 ~ BI]. 
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Hence 


e B 
| [PURE +a) (AP +a aay < 4— BI f ater 
|All + BI) 
[A— BI 


Combining (X.27), (X.28), (X.30), and (X.32), we obtain the inequality 
(X.26). a 


= ||A—BIl log = 2\|A — Bll log (X.32) 
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Inequalities obtained above are global, in the sense that they are valid 
for all pairs of operators A, B. Some special results can be obtained if B is 
restricted to be close to A, or when both are restricted to be away from 0. It 
is possible to derive many interesting inequalities by using only elementary 
calculus on normed spaces. A quick review of the basic concepts of the 
Fréchet differential calculus that are used below is given in the Appendix. 

Let f be any continuously differentiable map on an open interval J. Then 
the map that f induces on the set of Hermitian matrices whose spectrum is 
contained in J is Fréchet differentiable. This has been proved in Theorem 
V.3.3, and an explicit formula for the derivative is also given there. For each 
A, the derivative Df(A) is a linear operator on the space of all Hermitian 
matrices. ‘The norm of this operator is defined as 


|Df(A)|| = Sup |Df(A)(B)]I. (X.33) 

=1 
More generally, any unitarily invariant norm on Hermitian matrices leads 
to a corresponding norm for the linear operator Df(A); we denote this as 


IID F(A)Ill = ie IID F(A)(B)Il. (X.34) 


For some special functions f, we will find upper bounds for these quanti- 
ties. Among the functions we consider are operator monotone functions on 
(0,00). The square root function f(t) = t!/? is easier to handle, and since 
it is especially important, it is worthwhile to deal with it separately. 


Theorem X.3.1 Let f(t) = t!/2, 0 < t < oo. Then for every positive 
operator A, and for every unitarily invariant norm, 


DF (ADI S 1/2 ArT??? (X.35) 


Proof. The function g(t) = t?, 0 < t < oo is the inverse of f. So, by 
the chain rule of differentiation, Df(A) = [Dg(f(A))|~! for every positive 
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operator A. Note that Dg(A)(X) = AX + XA, for every X. So 
[Da(F(A))(X) = AU2X + XAY?, 


If A has eigenvalues a, > --: > Qn > 0, then dist(a(A!/?), o(—A/?)) 
= 2a}/* = 2\|A-1|/-1/2. Hence, by Theorem VII.2.12, 


Do(F(A))IA IS 1/2 JAP. 


This proves the theorem. a 


Exercise X.3.2 Let f € C'(I) and let f’ be the derivative of f. Show that 


IF (ADI = IDFCA)(DII ¢ ID F(A) (X.36) 
Thus, for the function f(t) = t!/? on (0,00), 
IDF(A)I = IF (ADI (X.37) 


for all positive operators A. 


Theorem X.3.3 Let yp be the map that takes an invertible operator A to 
_ its absolute value |A|. Then, for every unitarily invariant norm, 


| De(A)|l| < cond(A) = ||A~™|| |] Al]. (X.38) 


Proof. Let g(A) = A*A. Then Dg(A)(B) = A*B + B*A. Hence 
\||Dg9(A)]|| < 2||Al|. The map ¢ is the composite fg, where f(A) = Al/?. 
So, by the chain rule, Dy(A) =D f(g(A))Dg(A) = Df(A*A)Dg(A). Hence 


|Dyp(A) Ill < ILD F(A* A) Ill | Dg(A)II. 


The first term on the right is bounded by 5||A7~1|| by Theorem X.3.1, and 
the second by 2||A||. This proves the theorem. a 


The following theorem generalises Theorem X.3.1. 


Theorem X.3.4 Let f be an operator monotone function on (0,00). Then, 
for every unitarily invariant norm, 


ID FCADIN SFA) (X.39) 


for all positive operators A. 


Proof. Use the integral representation (V.49) to write 


f(t) =a+ Bt +/ (et 3) dy(d), 
0 
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where a, @ are real numbers, @ > 0, and p is a positive measure. Thus 


I—(A+ A)" 3] du(d). 


f(a)=ar+ pat f |r 
0 


Using the fact that, for the function g(A) = A7! we have Dg(A)(B) = 
—A~*BA™~?, we obtain from the above expression 


Df(A)(B) = BB + Jo + A)" B(A +.A)~dy(A). 


Hence 
IIDF(A)I| < 6+ / (A+ AY P2au(a). (X.40) 
0 


From the integral representation we also have 


f'(t)=B4 / ope aH: 


Hence 
OO 


f"(ADI| = lez + / (A + A)~2ay(A)]). (X.41) 


0 


If A has eigenvalues a] > --- > an, then since @ > 0, the right-hand sides 
of both (X.40) and (X.41) are equal to 


B+ / (A+ am) ~2ayi(.). 
0 


This proves the theorem. a 


Exercise X.3.5 Let f be an operator monotone function on (0,00). Show 
that 


ID F(A) = ID F(A))I = NF CAD 


Once we have estimates for the derivative D f(A), we can obtain bounds 
for || f(A) — f(B)|| when B is close to A. These bounds are obtained using 
Taylor’s Theorem and the mean value theorem. 

Using Taylor’s Theorem, we obtain from Theorems X.3.3 and X.3.4 above 
the following. 
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Theorem X.3.6 Let A be an invertible operator. Then for every unitarily 
envariant norm 


I |A] — [BI I] < cond(A)|A — Bll + OCA — Bil?) (X.42) 
for all B close to A. 


Theorem X.3.7 Let f be an operator monotone function on (0,00). Let 
A be any positive operator. Then for every unitarily invariant norm 


F(A) — FBYM S NF'CADI MLA — Bll + OCA — BIIP) (X.43) 


for all positive operators B close to A. 


For the functions f(t) = t", 0 <r < 1, we have from this 
| A" — B"|| <rfJA™* |" |]A — BI] + O(||A — BI’). (X.44) 


The use of the mean value theorem is illustrated in the proof of the 
following theorem. 


Theorem X.3.8 Let f be an operator monotone function on (0,00) and 
let A, B be two positive operators that are bounded below by a; i.e., A> al 
and B > al for the positive number a. Then for every unitarily invariant 
norm 


f(A) — F(B)I < F'(@)||A — Bll. (X.45) 


Proof. Use the integral representation of f as in the proof of Theorem 
X.3.4. We have 


OO 


f(A) = BI + Jo + A)~*dp(A). 
0 
If A > al, then 


f(A) < BI+| / (A-+a)~dp(A)T = f'(a)l. 
O 


Let A(t) = (1—t)A+tB, 0 <t<1.If A and B are bounded below by 
a, then so is A(t). Hence, using the mean value theorem, the inequality 
(X.39), and the above observation, we have 


f(A) — FB) s pep, HlPPAM (4 Ol 
< sup IFAM) lA" (@)Il 
<  f'(a)||A — Bl. 
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A special corollary is the inequality 


A” — B' ||| < ra™*||A— Bill, O0<r<l, (X.46) 


valid for operators A,B such that A > aI and B > al for some positive 
number a. 

Other inequalities of this type are discussed in the Problems section. 

Let A = UP be the polar decomposition of an invertible matrix A. Then 
P = |A| and U = AP". Using standard rules of differentiation, one can 
obtain from this, expressions for the derivative of the map A — U , and 
then obtain perturbation bounds for this map in the same way as was done 
for the map A — |A|. There is, however, a more effective and simpler way 
of doing this. 

The advantage of this new method, explained below, is that it also works 
for other decompositions like the QR decomposition, where explicit formu- 
lae for the two factors are not known. For this added power there is a small 
cost to be paid. The slightly more sophisticated notion of differentiation on 
a manifold of matrices has to be used. We have already used similar ideas 
in Chapter 6. 

In the space M(n) of n x n matrices, let GL(n) be the set of all invertible 
matrices, U(n) the set of all unitary matrices, and P(n) the set of all 
positive (definite) matrices. All three are differentiable manifolds. The set 
GL(n) is an open subset of M(n), and hence the tangent space to GL(n) 
at each of its points is the space M(n). The tangent space to U(n) at 
the point J, written as T;U(n), is the space K(n) of all skew-Hermitian 
matrices. This has been explained in Section VI.4. The tangent space at 
any other point U of U(n) is TyU(n) =U-K(n) = {US: S € K(n)}. Let 
7H(n) be the space of all Hermitian matrices. Both H(n) and K(n) are real 
vector spaces and H(n) = 7K(n). The set P(n) is an open subset of H(n), 
and hence, the tangent space to P(n) at each of its points is H(n). 

The polar decomposition gives a differentiable map ® from GL(n) onto 
U(n) x P(n). This is the map ®(A) = (®,(A), ®2(A)) = (U, P), where 
the invertible matrix A has the polar decomposition A = UP. Earlier in 
this section we called (A) just y(A) and evaluated its Fréchet derivative. 
An explicit formula for the derivative D®,(A) is obtained below. This map 
is a linear map from M(n), the tangent space to GL(n), into the space 
U -K(n), the tangent space to U(n) at the point U. 

The main idea of the proof below is simple. Let V be the map from 
U(n) x P(n) to GL(n) that is the inverse to ®; ie., Y(U, P) = UP. This 
is a much simpler object to handle, since it is just a product map. We 
can calculate the derivative of this map and then use the inverse function 
theorem to get the derivative of the map ®. 


Theorem X.3.9 Let ®; be the map from GL(n) into U(n) that takes an 
invertible matriz to the unitary part in its polar decomposition, ®;(UP) = 
U. Then for each X € M(n), the value of the derivative D®,(UP) at the 


306 X. Perturbation of Matrix Functions 


point UX is given by the formula 


[D®,(UP)|(UX) = 2U / e*P(4Im X)e~* dt. (X.47) 
0 


Proof. The domain of the linear map DW(U, P) is the tangent space to 
the manifold U(n) x P(n) at the point (U, P). This space is (U-K(n), H(n)). 
The range of DW(U, P) is the tangent space to GL(n) at UP. This space 
is M(n). We will use the decomposition M(n) = U-K(n)+U -H(n) that 
arises from the Cartesian decomposition. By the definition of the derivative, 
we have 


d 


[DUU,P)\(US,H) = = 


w(Ue’, P+tH) 
t=0 


—| Ue(P+tH 
dt|, (P + tH) 


= USP+UH 


for all S € K(n) and H € H(n). 
The derivative D®(U P) is a linear map from M(n) onto (U-K(n), H(n)). 
Suppose 


[D®(UP)\(UX) = (UM, N). 
Since ® = U~!, from the two equations above we see that 
UX =[D®(UP)|~'(UM,N) = [DU(U, P) (UM, N) =UMP+UN. 


Hence, 


X=MP+N. 


Our task now is to find M from this equation. Note that M is skew- 
Hermitian and N Hermitian. Hence, from the above equation, we obtain 


MP+PM =X — X* =2iImX. 


This equation was studied in Chapter 7. From Theorem VII.2.3 we have 
its solution 


M= 2 | et? (itm X)er*? dt 
0 
This gives us the expression (X.47). | 


Corollary X.3.10 For every unitarily invariant norm we have 


|D®,(UP)|] = |P~*]. (X.48) 
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Proof. Using (X.47) and properties of unitarily invariant norms, we see 
that 


||D&,(UP)UX)|| <2 / er? || XU] er? Ila. 
0 


If P has eigenvalues a; > --- > an, then ||e~*? || = e~*@". So, 


|D&:(UP)(UX)|| 


lA 


2 / en 24 IX Ide 


on IX] = !P~* |) IX. 


| 


Hence, 
|D®i(UP)||| = ue HPMOP))II < ||P]. 


The choice X = ivu*/||lvv*||], where v is an eigenvector of P belonging to 
the eigenvalue a,,, shows that the last inequality is in fact an equality. a 


Two corollaries follow; the first one is obtained using the mean value 
theorem and the second one using Taylor’s Theorem. 


Corollary X.3.11 Let Ap, Ai be two elements of GL(n), and let Uo, U; 
be the unitary factors in their polar decompositions. Suppose that the line 
segment A(t) = (1—t)Ap +tAi, O<t <1, lies inside GL(n). Then, for 
every unitarily invariant norm 


Wo — Wall < max A)" - Ho — Aull (X.49) 


Corollary X.3.12 Let Ag be an invertible matrix with polar decomposition 
Ao = UoPy. Then, for a matrix A = UP in a neighbourhood of Ap, we have 


Uo — Ul < Ag] Ao — All] + O( Ao — All”). (X.50) 


Exercise X.3.13 From the proof of Theorem X.8.9 one can also extract a 
bound for the derivative of the map A — |A|. What does this give? Compare 
it with the result of Theorem X.8.3. 


Let us see now how this method works for a perturbation analysis of the 
(JR decomposition. 

Let A.(n) be the set of all upper triangular matrices with positive di- 
agonal entries. Each element A of GL(n) has a unique factoring A = QR, 
where Q € U(n) and R € A,(n). Thus the QR decomposition gives rise 
to an invertible map ® from GL(n) onto U(n) x A,(n). Let Are(n) be 
the set of all upper triangular matrices with real diagonal entries. This is a 
real vector space, and A,(n) is an open set in it. Thus the tangent space 
to A,(n), at any of its points, is Are(n). For each A = QR in GL(n) 
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the derivative D®(A) is a linear map from the vector space M(n) onto the 
vector space (Q- K(n), Are(n)). We want to calculate the norm of this. 

First note that the spaces K(n) and Are(n) are complementary to each 
other in M(n). We have a vector space decomposition 


M(n) = K(n) + Are(n). (X.51) 


Every matrix X splits as X = K + T in this decomposition; the entries of 
X,K and T are related as follows: 


k;; = tlm L 55 for all q, 
ki; = —2 54 for J>4, 
kig = Bij for ’> J; 

, X.92 
t55 = Re LX 55 for all ); ( é ) 
tig = Lig t 2; for 7>41, 
ti; = 0 for i > j. 


Exercise X.3.14 Let P; and P2 be the complementary projection opera- 
tors in M(n) corresponding to the decomposition (X.51). Show that 


Pile = ||Palle = v2, 


where ||P;|l2 = sup ||P;X|l2, and ||-|l2 stands for the Frobenius (Hilbert- 
|X ll2=1 
Schmidt) norm. 


Now let W be the map from U(n) x A,(n) onto M(n) defined as 
v(Q,R) = QR. Then WV and © are inverse to each other. The derivative 
DW(Q, R) is a linear map whose domain is the tangent space to the mani- 
fold U(n) x A,(n) at the point (Q, R). This space is (Q-K(n), Are(n)). 
Its range is the space M(n) = Q-K(n)+Q- Are(n). By the definition of 
the derivative, we have 


d 


w(Qe'* ,R+tT) 


t=0 


a tK 
dt|,_, Qe (R+tT) 


= QKR+QT 


for all K € K(m) and T € Are(n). 


The derivative D®(QR) is a linear map from M(n) onto Q- K(n) + 
Are(n). Suppose 


[D®(QR)|(QX) = (QM, N), 
where M € K(n) and N € Are(n). Then we must have 


QX = [D®(QR)|""(QM, N) = [DW(Q, R)|(QM, N) = QUR+QN. 
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Hence 
X=MR-+N. 


So, we have the same kind of equation as we had in the analysis of the polar 
decomposition. There is one vital difference, however. There, the matrices 
M,N were skew-Hermitian and Hermitian, respectively, and instead of the 
upper triangular factor R we had the positive factor P. So, taking adjoints, 
we could eliminate N and get another equation that we could solve explic- 
itly. We cannot do that here. But there is another way out. We have from 
the above equation 


XR7'=M+NR. (X.53) 


Here M € K(n); and both N and R7! are in Aye(n), and hence so is their 
product NR~!. Thus the equation (X.53) is nothing but the decomposition 
of X R~' with respect to the vector space decomposition (X.51). In this way, 
we now know M and N explicitly. We thus have the following theorem. 


Theorem X.3.15 Let ©,,62 be the maps from GL(n) into U(n) and 
A.(n) that take an invertible matrix to the unitary and the upper tri- 
angular factors in its QR decomposition. Then for each X € M(n), the 


derivatives D®|(QR) and D®2(QR) evaluated at the point QX are given 
by the formulae 


[D®1(QR)](QX) = QPi(XR7}), 
[D&2(QR)|(QX) = Po(XR~")R, 


where P, and P2 are the complementary projection operators in M(n) cor- 
responding to the decomposition (X.51). 


Using the result of Exercise X.3.14, we obtain the first corollary below. 
Then the next two corollaries are obtained using the mean value theorem 
and Taylor’s Theorem. 


Corollary X.3.16 Let ®,,®2 be the maps that take an invertible matrix 
A to the Q and R factors in its QR decomposition. Then 


|D®(A)ll2 < V2\|A7*], 

|| D&2(A)|lz_ < V2 cond(A) = V2 ||Al] |A7* I 
Corollary X.3.17 Let Ao, A; be two elements of GL(n) with their respec- 
tive QR decompositions Ap = QoRo and A, = Q1R,. Suppose that the line 
segment A(t) = (1—t)Ap + tA, O<t< 1, kes in GL(n). Then 


Qo — Qille < v2 qmax IAG)" | Ao — Alle, 


— t))|| Ao — Ail. 
| Ro — Rill2 < v2 max cond(A(t))||Ao ~ ills 
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Corollary X.3.18 Let Ap = QoRo be an invertible matrix. Then for every 
matriz A= QR close to Ao, 


Qo — Qll2 < V2 ||Ap "|| | Ao — All2 + O(||Ao — All3), 
||Ro — Ril2 < V2 cond(Ap)||Ao — All2 + O(||Ao — Al]3). 


For most other unitarily invariant norms, the norms of projections P 
and P2 onto the two summands in (X.51) are not as easy to calculate. Thus 
this method does not lead to attractive bounds for these norms in the case 
of the QR decomposition. 


X.4. Appendix: Differential Calculus 


We will review very quickly some basic concepts of the Fréchet differential 
calculus, with special emphasis on matrix analysis. No proofs are given. 

Let X,Y be real Banach spaces, and let C(X,Y) be the space of bounded 
linear operators from X to Y. Let U be an open subset of X. A continuous 
map f from U to Y is said to be differentiable at a point u of U if there 
exists T € £(X,Y) such that 


lim fu + v) — fu) = Toll — 0. (X.54) 
v0 a 
It is easy to see that such a T,, if it exists, is unique. 

If f is differentiable at u, the operator T above is called the derivative 
of f at u. We will use for it the notation Df (u). This is sometimes called 
the Fréchet derivative. If f is differentiable at every point of U, we say 
that it is differentiable on U. 


One can see that, if f is differentiable at u, then for every v € X, 


Df(u)(v) = a f(ut+tv). (X.55) 
dt |, 
This is also called the directional derivative of f at u in the direction v. 
The reader will recall from elementary calculus of functions of two vari- 
ables that the existence of directional derivatives in all directions does not 
ensure differentiability. 
Some illustrative examples are given below. 


Example X.4.1 (i) The constant function f(x) = c ts differentiable at all 
points, and Df(x) =0 for all x. 

(it) Every linear operator T is differentiable at all points, and is its own 
derivative; t.e., DT (u)(v) = Tv, for all u,v in X. 

(itt) Let X,Y,Z be real Banach spaces and let B: X x Y + Z be 
a bounded bilinear map. Then B is differentiable at every point, and its 
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derivative DB(u,v) is given as 
DBlu,v)(x,y) = Biz, v) + Blu, y). 

(tv) Let X be a real Hilbert space with inner product (-,-), and let f(u) = 
jul/? = (u,u). Then f is differentiable at every point and Df(u)(v) = 
2(u, Vv). 

The next set of examples is especially important for us. 


Example X.4.2 In these examples X = Y = L(H). 
(i) Let f(A) = A*. Then 


Df(A)(B) = AB+ BA. 


(it) More generally, let f(A) = A", n > 2. From the binomial expansion 
for (A+ B)” one can see that 


Df(A(B)= S—> AIBA*. 
to 


(iit) Let f(A) = A7~? for each invertible A. Then 
Df(A)(B) =—A7'BA™. 
(iv) Let f(A) = A*A. Then 
Df(A)(B) = A* B+ B*A. 
(v) Let f(A) = e4. Use the formula 


1 
eAtB = eA _ [eo-o4p et(At+B) at 
0 
(called Dyson’s expansion) to show that 
1 
Df (A)(B) = [eco Be’ dt. 
0 


The usual rules of differentiation are valid: 
If fi, fg are two differentiable maps, then f; + fo is differentiable and 


D(fi + f2)(u) = Dfi(u) + Dfalu). 


The composite of two differentiable maps f and g is differentiable and we 
have the chain rule 


D(g- f)(u) = Do(f(u)) - DF (wu). 
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In the special situation when g is linear, this reduces to 


D(g- f)(u) =9- Df(u). 


One important rule of differentiation for real functions is the product 
rule: (fg)’ = f’g +f’. If f and g are two maps with values in a Banach 
space, their product is not defined — unless the range is an algebra as well. 
Still, a general product rule can be established. Let f, g be two differentiable 
maps from X into Y;, Y2, respectively. Let B be a continuous bilinear map 
from Y; x Y2 into Z. Let y be the map from X to Z defined as y(z) = 
B(f(x),9(x)). Then for all u,v in X 


De(u)(v) = BDF (u)(v), g(u)) + B(f(u), Dg(u)(v)). 


This is the product rule for differentiation. A special case of this arises 
when Y; = Y2 = L(Y), the algebra of bounded operators in a Banach 


space Y. Now (x) = f(x)g(zx) is the usual product of two operators. The 
product rule then is 


De(u)(v) = [Df(u)(v)] - g(u) + f(u) - [Dg(u)(v)]. 


Exercise X.4.3 (i) Let f be the map A > A~! on GL(n). Use the product 
rule to show that 


Df(A)(B) = —ABA“!, 


This can also be proved directly. 
(it) Let f(A) = A~?. Show that 


Df(A)(B) = —A7!BA~? — A~? BAW}, 


(iit) Obtain a formula for the derivative of the map f(A) =A", n= 
3,4, .... 


Perhaps, the most useful theorem of calculus is the Mean Value Theorem. 


Theorem X.4.4 (The Mean Value Theorem) Let f be a differentiable map 


from an interval I of the real line into a Banach space X. Then for each 
closed interval |a, b| contained in I, 


f(b) — f(a)|| < |b-a| sup ||DF(t)]). 
a<t<b 
This is the version we have used often in the book, with [a, 6] = [0, 1]. 
There is a more general statement: 


Theorem X.4.5 (The Mean Value Theorem) Let f be a differentiable map 
from a conver subset U of a Banach space X into the Banach space Y. Let 
a,b€U and let L be the line segment joining them. Then 


I f(6) — F(a)ll < |]b—a| sup |DF(u) ||. 
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(Note that there are three different norms that occur in the above inequal- 
ity. ‘These are the norms of the spaces Y, X, and £(X,Y), respectively.) 

Higher order Fréchet derivatives can be identified with multilinear maps. 
This is explained below. 

Let f be a differentiable map from X to Y. At each point u, the deriva- 
tive Df(u) is an element of the Banach space £(X,Y). Thus we have a 
map Df from X into £(X,Y), defined as Df : u — Df(u). If this map 
is differentiable at a point u, we say that f is twice differentiable at u. 
The derivative of the map Df at the point wu is called the second deriva- 
tive of f at u. It is denoted as D?f(u). This is an element of the space 
L£(X,L£(X,Y)). This space is isomorphic to another Banach space, which 
is easier to handle. 

Let £L2(X, Y) be the space of bounded bilinear maps from X x X into Y. 
The elements of this space are maps f from X x X into Y that are linear 
in both variables, and for whom there exists a constant c such that 


|f(t1,%2)l] < elles] || 


for all x},r%2 € X. The infimum of all such c is called ||f||. This is a norm 
on the space £2(X,Y), and the space is a Banach space with this norm. 
If ~ is an element of L(X, L(X,Y)), let 


P(@1,22) = [p(x1)|(v2) for 21,22 € X. 


Then ~ € £Lo(X,Y). It is easy to see that the map y — @¢ is an isometric 
isomorphism. 

Thus the second derivative of a twice differentiable map f from X to Y 
can be thought of as a bilinear map from X x X to Y. It is easy to see that 
this map is symmetric in the two variables; i.e., 


D* f(u)(v1, v2) = D? f(u)(v2, 01), 


for all u,v1,v2. (This symmetry property is extremely helpful in guessing 
the expression for the second derivative of a given map.) 
Some examples on the space of matrices are given below. 


Example X.4.6 Let X = M(n) and let f(A) = A?,A € M(n). We have 
seen that Df(A)(B) = AB+BaA for all A, B. Note that Df(A) = Lat+Ra, 
where La and Ra are linear operators on M(n), the first one is the left 
multiplication by A and the second one is right multiplication by A. The 
map Df : A — Df(A) is a linear map from M(n) into C(M(n)). So 
the derivative of this map, at each point, is the map itself. Thus for each 
A, D? f(A) = Df. In other words, 


[D° f(A)|(B) = Df(B) = Le + Re. 
If we think of D? f(A) as a linear map from M(n) into L(M(n)), we have 
[D* f(A)(B1)|(B2) = (La, + Rp,)(B2) = Bi Bo + BoB 
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for all B,, Bo. If we think of tt as a bilinear map, we have 
[D* f(A)](Bi, Bo) = Bi Bz + B2By. 


Note that the right-hand side is independent of A. So the map A — D? f(A) 
is a constant map. These are noncommutative analogues of the facts that 
if f(z) = 27, then f'(x) = 2x and f"(x) =2. 


Example X.4.7 Let f(A) = A®. We have seen that 
Df(A)(B) = A?7B + ABA+ BA?. 


This is the noncommutative analogue of the fact that if f(x) = x°, then 
f'(x) = 3x2. What is the second derivative? From the formula f(x) = 6a, 
and the fact that D* f(A) is a symmetric bilinear map, we can guess that 


[D* f(A)|(Bi, Be) = AB,B.+B,AB2+B,B,A+ABoB,+B,AB,+BoB,A. 


Prove that this indeed is the right formula for D? f(A). Note that the map 
A — D* f(A) is linear. 


Example X.4.8 More generally, let f(A) = A”. From the binominal the- 
orem one can see that 


[D*f(A)|\(Bi,B2) = S$ > [A7B, AB, A* + AI B,A*B, A4. 


jtk+l=n—2 
j,k,e>0 


Example X.4.9 Let f(A)=A7~*, A € GL(n). We know that Df(A)(B) = 
—A7~'BA™, for all B € M(n). This is the noncommutative analogue of 
the formula (x~*)’ = —x~?. The analogue of the formula (x—!)” = 27-8 
ws the following: 


[D* f(A)](Bi, Bz) = A71B,A71B,A7! + AW! By AB, AT? 
This can be guessed from the bilinearity and symmetry properties that 


D? f(A) must have. It can be proved formally by the rules of differentia- 
tion. 


Example X.4.10 Let f(A)=A7?, A € GL(n). We know that Df (A)(B)= 
—A~'BA~? — A~?BA~!. Show that 


[D* f(A)|\(Bi, Bo) = A~?B,A7B.A7! + A-2B,A7'B, AT? 
+ A~'B,A~?B,A7! + 47! ByA~2B, AW 
+ A~'B,A7'BjA~? + A7!ByA7!B, A™~?. 


This is the analogue of the formula (x~?)" = 6x74. 
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Example X.4.11 Let f(A) = A*A. We have seen that Df(A)(B) = 
A*B + B*A. Show that D* f(A)(Bi, Bo) = BY} Bo + B3B,. Note that this 
expression does not involve A. So the map A — D? f(A) is a constant map. 


Derivatives of higher order can be defined by repeating the above proce- 
dure. The pth derivative of a map f from X to Y can be identified with a 
p-linear map from the space X x X x---x X (p copies) into Y. A convenient 
method of calculating the pth derivative of f is provided by the formula 


oP 


DEP(U)(15-+-5%) = Ba 
Pp 


f(uttyv, +:- -+tpUp). (X.56) 


ty=--=tp=0 


Compare this with the formula (X.55) for the first derivative. 
Example X.4.12 Let f(A) = A", A€ L(H). Then for p = 1,2,...,n 


[DP f(A)|(Br,... By) 
» DAN Boia) A? Boia) +++ A?” Bop AP, 


oES,y 3420, 
jit: +ippi=n—p 


where Sy is the set of all permutations on p symbols. There are Tae py terms 
in the above double sum. These are all words of length n in which n—p of 
the letters are A and the remaining letters are Bj,... , By, each occurring 
exactly once. Notice that this expression is linear and symmetric in each of 
the variables By,...,Bp. When dim H = 1, this reduces to the formula for 
the pth derivative of the function f(x) = x”: 


_ nm—-p __ n| n—p 
f(x) =n(n=1)---(n—ptla “Topi 

The reader should work out some more simple examples to see the ex- 
pressions for higher derivatives. 

Another important theorem of calculus, Taylor’s Theorem, has an 
analogue in the Fréchet calculus. Of the different versions possible, the one 
that is most useful for us is given below. 

Let f be a (p+1)-times differentiable map from a Banach space X into a 
Banach space Y. For h € X, write [h]™ to mean the m-tuple (h,h,...,h). 
Then, for all x € X and for small h, 


Ife +h) — Fla) — > D™F(o)({)") | = O(a. 


m=l1 


From this we get 


f(z +h) — fl2)|| s La D™F(@)IL [al + OMAN). 
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Finally, let us write down formulae for higher order derivatives of the 
composite of two maps. These are already quite intricate for real functions. 
If we have y = f(g(x)), then we have 


g(r) = f%(g(x))g (zx), 
ep) (z) = f(g(z))[g™ (x)? + F™ (g(z))g (2), 
eA)(2) = f®(g(x))[g™(x)]> + 3f (g(a))g™ (x) g (x) 


+ £%(g(x))9 (2). 


If X,Y, Z are Banach spaces and if f is a map from X to Y, and g a map 
from Y to Z, then for the derivatives of the composite map y = f og, we 
have the following formulae. By the chain rule, 


Dy(xz) = Df (g(x))Dg(2). 


The second and the third derivatives are bilinear and trilinear maps, re- 
spectively. For them we have the formulae: 


[D*p(x) (21,22) = [D? f(g(x))|(Dg(x)(21), Dg(x)(z2)) 
+ Df (9(x))([D*9(x)](x1, 22)), 


[D"o(x)](21,22,03) = [D*f(9(x))](Dg(x)(21), Dg(x)(x2), Dg(x)(zx3)) 
+ [D* f(9(x))|(Dg(z)(x1), [D?g(x)](x2, z3)) 
+ [D* f(9(x))](Dg(x) (x2), [D?g(x)](x1, 23)) 
+ [D* f(9(x))|(Dg(x)(xs), [D?9(x)](x1, ¢2)) 
+ Df(9(x))[D*9(x)](x1, x2, 23). 


The reader should convince himself that considerations of domains and 
ranges of the maps involved, symmetry in the variables, and the demand 
that in the case of real functions we should recover the old formulae lead 
to these general formulae. He can then try proving them. 

We have also used the notion of the derivative of a map between mani- 
folds. If X and Y are differentiable manifolds in finite-dimensional vector 
spaces, and f is a differentiable map from X to Y, then at a point u of X 
the derivative D f(x) is a linear map from the linear space T,X into the lin- 
ear space I’s(,,)Y. These are the tangent spaces to X and Y at u and f(u), 
respectively. All manifolds we considered are subsets of M(n). Of these, 
GL(n), P(n), and A,(n) are open subsets of vector subspaces of M(n). 
So these vector spaces are the tangent spaces for the manifolds. The only 
closed manifold we considered is U(n). It is easy to find the tangent space 
at any point of this manifold. This was done in Chapter 6. Most of the 


results of Fréchet calculus can be restated in this setup with appropriate 
modifications. 
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X.5 Problems 


Problem X.5.1. Let f be a nonnegative operator monotone function on 
(0, co). 
(i) Show that if A is positive and U unitary, then 


IIF(A)U — UF(A)II < IFQAU — UA))I. 


(ii) Let A be positive and X Hermitian. Let U be the Cayley transform 
of X. We have 


U (X —i)(X +i)", 
X = i+U)I-U)™* = 21 -U)*- 


Show that 


| F(A)X — Xf(A)I < 2 — 0)" PII F(AU — UA) II. 


Use the relation between U and X again to estimate the last factor, and 
show that 


Isa)x ~ xpayiy s BCD) (jax - xa) 


(iii) Let A, B be positive and X arbitrary. Use the above inequality to 
show that 


2 


WAX — Xs(B)I s FBG; (I 


|AX — xB|) r 


[Hint: Use 2 x 2 block matrices.| 
When X = I, this reduces to the inequality (X.5). 


Problem X.5.2. Let f be a nonnegative operator monotone function. Let 
A,B be positive matrices and let X be any contraction. Show that 


IWF(A)X — X f(B)Il| < 5/4 ||\(AX — XB) Il 
[Hint: Use the result of the preceding problem, replacing X there by 5X.| 


Problem X.5.3. From the above inequality it follows that if A,B are 
positive and X is any matrix, then forO <r<l, 


|A"X — XB"|| < 5/4 |X|" AX — XBI’. 
Show that we have under these conditions 


| AX — XB" lo < ||Xlp "AX — X Bills. 
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[Hint: Reduce the general case to the special case A = B. Use Holder’s 
inequality. | 


Problem X.5.4. Let A,B be any two operators. Show that 


[ILA] — |BI)*Il| < A+ Bl] [JA — BI]. 


Problem X.5.5. Let A be a positive operator such that A > aI > 0. Show 
that for every X 
2al||X||| < JAX + X All. 


[Use the results in Section VII.2.] 
Use this to show that if A,B are positive operators such that 
Al/2 4 Bl/2 > aI > 0, then 


1 
|"? — BY? |] < =||.A — Bil. 
. Q 


[Hint: Consider the operators A1/? + B1/? and Al/? — B1/2) 


Problem X.5.6. Let A and B be positive operators such that A > al > 0 


and B > bI > 0. Show that for every nonnegative operator monotone 
function f on (0,00) 


IIF(A) — f(B)|ll < C(a, 6) IA — Bl, 
where C'(a, b) = Tha) (2) if a # b, and C(a,b) = f'(a) ifa = b. 


Problem X.5.7. Let f be a real function on (0,00), and let f™ be its nth 
derivative. Let f also denote the map induced by f on positive operators. 


Let D” f(A) be the nth order Fréchet derivative of this map at the point 
A. Let | 


D”) = {f :||D"f(A)|| = || f™(A)]] for all positive A}. 


We have seen that every operator monotone function is in the class D@), 
Show that it is in D™) for all n = 1,2,.... 


Problem X.5.8. Several examples of functions that are not operator 
monotone but are in D“) are given below. 


(i) Show that for each integer n, the function f(t) = t” on (0,00) is in 
the class D™), 


(ii) Show that the function f(t) = a9 +a,;t+---+a,t” on (0,co), where 


n is any positive integer and the coefficients a,; are nonnegative, is in 
the class D@). 
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(iii) Any function on (0,00) that has a power series expansion with non- 
negative coefficients is in the class D{), 


(iv) Use the Dyson expansion to show that the exponential function is in 
the class D@), 


(v) Let f(t) = fe -*du(A), where p is a positive measure on (0, 00). 
0 
Show that f € D@). [Use part (iv).| 


(vi) From the Euler’s integral for the gamma function, we can write, for 
r> 0, 


1 CO 
t — —Aty\r—1 
T(r) J A" dx. 
0 
Use this to show that for each r > 0, the function f(t) = t77 on 
(0,00) is in DM, 


Problem X.5.9. The Cholesky decomposition of a positive definite matrix 
A is the (unique) factoring A = R*R, where R is an upper triangular 
matrix with positive diagonal entries. This gives an invertible map ® from 
the space P(n) onto the space A, (n). Show that 


|D®(A) 2 < <5 All"? 4] 


for every A. Use this to write local perturbation bounds for the map ®. 


Problem X.5.10. A matrix is called strongly nonsingular if all of its 
leading principal minors are nonzero. Such matrices form a dense open set 
in the space M(n). Every strongly nonsingular matrix A can be factored 
as A = LR, where L is a lower triangular matrix and R an upper trian- 
gular matrix. Further, L can be chosen to have all of its diagonal entries 
equal to 1. With this restriction the factoring is unique. This is the LR 
decomposition familiar in linear algebra and numerical analysis. 

Let S be the set of strongly nonsingular matrices, Aj the set of lower 
triangular matrices with unit diagonal, and A,, the set of nonsingular 
upper triangular matrices. Let ®;, 2 be the maps from S into Aj and A,, 
given by the LR decomposition. 

The set S is an open set in M(n). So the tangent space to it at any point 
is M(n). The set A, is an open subset of the vector space A consisting 
of all upper triangular matrices. So the tangent space to A,, at any point 
is A. The set Aj is a differentiable manifold (a Lie group, in fact). The 
tangent space at J to this manifold is the space Aj, consisting of lower 
triangular matrices with zero diagonal. 
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Follow the approach in Section X.3 to obtain the bounds: 
|| DB (A)|lz < cond(L)||R~* |), 


|| D®2(A)|l2 < cond(R)||L7"]]. 


Use these to obtain local perturbation bounds for the LR decomposition. 


X.6 Notes and References 


Most of the results in this chapter can be proved for infinite dimensional 
Hilbert space operators. Many of them are valid for operator algebras as 
well. 

Let f be a continuous real function on an interval that contains the spec- 
tra of two Hermitian operators A, B (on a Hilbert space H). The problem 
of finding bounds for || f(A) — f(B)|| in terms of ||A — B|| has been inves- 
tigated in great detail by many authors. Many deep results on this were 
obtained by the Russian school of Birman, which includes Farforovskaya, 
Naboko, Solomyak, and others. 


When f is differentiable and f’ is bounded, one would expect to find 
inequalities of the form 


f(A) — FCB) < e€lIf'lleo JA — BI. 


Counterexamples to show that such inequalities are not true, in general, 
were constructed by Yu.B. Farforovskaya, An estimate of the norm 
|| f(B) — f(A)]| for self-adjoint operators A and B, Zap. Nauch. Sem LOMI, 
56(1976) 143-162. (English translation: J. Soviet Math. 14, No. 2(1980).) It 
was shown by M. Sh. Birman and M.Z. Solomyak that such inequalities can 
be found under stronger smoothness assumptions. The reader should see 
their paper titled Double Stieltjes operator integrals, English translation, in 
Topics in Mathematical Physics, Volume 1, Consultant Bureau, New York, 
1967. 

Theorem X.1.1 is taken from F. Kittaneh and H. Kosaki, Inequalities 
for the Schatten p-norm V, Publ. Res. Inst. Math. Sci., 23(1987) 433-443. 
The inequality (X.3) was proved by M.Sh. Birman, L.S. Koplienko, and 
M.Z. Solomyak, Estimates of the spectrum of the difference between frac- 
ttonal powers of self-adjoint operators, Izvestiya Vysshikh Uchebnykh Zave- 
denni. Mat, 19 (1975) 3-10. Its generalisation in Theorem X.1.3 is due to 
T. Ando, Comparison of norms ||| f(A) — f(B)\|| and ||| f({A—B])|||, Math. 
Z., 197(1988) 403-409. Our discussion of the material between Theorem 
X.1.3 and Exercise X.1.7 is taken from this paper. For p-norms, the in- 
equality (X.10) has another formulation: if A,B are positive 


JAY — BM Ip SJA- BI forp>1,t21. 
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The special case t = 2 of this was proved by R.T. Powers and E. Stdérmer, 
Free states of the canonical anticommutation relations, Commun. Math. 
Phys., 16 (1970) 1-33. The point of this formulation is that if A,B are 
positive Hilbert space operators and their difference A—B is in the Schatten 
class Z,, then A1/* — B}/* is in the class Z;p, and the above inequality is 
valid. 

Theorem X.1.8. is due to D. Jocic and F. Kittaneh, Some perturbation 
inequalities for self-adjoint operators, J. Operator Theory, 31(1994) 3-10. 
The proof of Lemma X.1.9 given here is due to R. Bhatia, A simple proof 
of an operator inequality of Jocic and Kittaneh, J. Operator Theory, 31 
(1994) 21-22. As in the preceding paragraph, for the Schatten p-norms, the 
inequality (X.12) can be written as 


| Blimp $ Pom Armed — Bombay t/2m41, 


form = 1,2,...,p > 1 and Hermitian A, B. The result is valid in infinite- 
dimensional Hilbert spaces. A corollary of this is the statement that if the 
difference A*”+! — B?™+! is in the Schatten class Z,, then A— B is in the 
class Z(am+1)p- 

The first inequality in Problem X.5.3 was proved by G.K. Pedersen, 
A commutator inequality (unpublished note). The generalisation in Prob- 
lem X.5.2, the inequalities in Problem X.5.1, and the second inequality in 
Problem X.5.3 are due to R. Bhatia and F. Kittaneh, Some inequalities for 
norms of commutators, SIAM J. Matrix Anal., 18(1997) to appear. The 
motivation for Pedersen was a result of W.B. Arveson, Notes on extensions 
of C*-algebras, Duke Math. J., 44 (1977)329-355. Let f be a continuous 
function on [0,1] with f(0) = 0, and let « > 0. Arveson showed that there 
exists a 6 > 0 such that if A and X are elements in the unit ball of a C*- 
algebra and A > 0, then ||AX — XA|| < 6 implies || f(A)X — Xf(A)|| < e. 
The inequality in Problem X.5.3 is a quantitative version of this for the 
special class of functions f(t) = t’,0 < r < 1. Weaker results proved 
earlier and their applications may be found in C.L. Olsen and G.K. Peder- 
sen, Corona C* - algebras and their applications to lifting problems, Math. 
Scand., 64(1989) 63-86. It is conjectured that the factor 5/4 occurring in 
these inequalities can be replaced by 1. 

The inequality (X.20) for p = 1 was proved by H. Kosaki, On the conti- 
nuity of the map y — |y| from the predual of a W* -algebra, J. Funct. Anal., 
59(1984) 123-131. For the Schatten p-norms, p > 2, the inequality (X.18) 
was proved by F. Kittaneh and H. Kosaki in their paper cited above. The 
other parts of Theorems X.2.3 and X.2.4, and Theorem X.2.1 were proved 
by R. Bhatia, Perturbation inequalities for the absolute value map in norm 
ideals of operators, J. Operator Theory, 19(1988) 129-136. 

The constant /2n in (X.25) can be replaced by a factor c, + logn. This 
has been known for some time and is related to other important problems 
in operator theory. See two papers by A. McIntosh, Countererample to a 
question on commutators, Proc, Amer. Math. Soc., 29(1971) 337-340, and 
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Functions and derivations of C*-algebras, J. Funct. Anal., 30(1978)264-275. 
It is also known that such a factor is indeed necessary, both for the operator 
norm and for the corresponding inequality for the trace norm ||.||1. This 
implies that, if H is infinite-dimensional, then the map A+ |A| on L(H) 
is not Lipschitz continuous. Nor is it Lipschitz continuous on the Schat- 
ten ideal Z;. The inequality (X.24) due to Araki and Yamagami, on the 
other hand, shows that on the Hilbert-Schmidt ideal Zz this map is contin- 
uous. For other values of p,1 < p < oo, E.B. Davies, Lipschitz continuity 
of functions of operators in the Schatten classes, J. London Math. Soc., 
37(1988)148-157, showed that there exists a constant yp that depends on 
p, but not on the dimension n, such that 


HIAl—[Bl lp S Yp llA - Bll, 


for all A, B. Theorem X.2.5 was proved in T. Kato, Continuity of the map 
S — |S| for linear operators, Proc. Japan Acad. 49 (1973) 157-160, and 
interpreted to mean that the map At |A| is “almost Lipschitz”. Results 
close to this were obtained by Yu.B. Farforovskaya in the papers cited 
above. 

Bounds like the ones in Section X.3 have been of interest to numerical 
analysts and physicists. References to much of this work may be found 
in R. Bhatia, Matrix factorizations and their perturbations, Linear Alge- 
bra Appl., 197/198 (1994) 245-276. Theorem X.3.1, and the proof given 
here, are due to C.J. Kenney and A.J. Laub, Condition estimates for ma- 
triz functions, SIAM J. Matrix Analysis, 10(1989) 191-209. Theorem X.3.4 
was proved in R. Bhatia, First and second order perturbation bounds for 
the operator absolute value, Linear Algebra Appl., 208/209 (1994) 367-376. 
Theorems X.3.3, X.3.6, X.3.7, and X.3.8 are also proved in this paper. The 
inequality in Problem X.5.5 is taken from J.L. van Hemmen and T. Ando, 
An inequality for trace ideals, Commun. Math. Phys., 76(1980) 143-148. 
This paper has references to physics literature, where such inequalities are 
used. ‘The inequality in Problem X.5.6 is proved in the paper by F. Kit- 
taneh and H. Kosaki cited earlier. Most of the results after Theorem X.3.9 
in Section X.3 were proved by R. Bhatia and K. Mukherjea, Variation of 
the unitary part of a matriz, SIAM J. Matrix Analysis, 15(1994) 1007-1014. 
The full potential of this method was exploited in the paper cited at the 
beginning of this paragraph, where several other matrix decompositions of 
interest in numerical analysis are studied. The results of Problem X.5.9 and 
X.5.10 are obtained in this paper. (Some of these were proved earlier using 
different methods by A. Barrlund, R. Mathias, G.W. Stewart, and J. G. 
Sun.) 

Bounds for the second derivative of the map A — |A| are obtained in R. 
Bhatia, First and second order perturbation bounds for the operator absolute 
value, Linear Algebra Appl., 208/209 (1994) 367-376; and for derivatives of 
higher orders in R. Bhatia, Perturbation bounds for the operator absolute 
value, Linear Algebra Appl., 226(1995) 639-645. The reader may try to 
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prove such inequalities using the methods explained in Section X.4. Since 
this map is the composite of two maps, A > A*A — (A*A)!/2, its analysis 
can be broken into two parts. No good bounds of higher order are known 
for other matrix decompositions. 

Results in Parts (v) and (vi) of Problem X.5.8 are taken from R. Bhatia 
and K.B. Sinha, Variation of real powers of positive operators, Indiana 
Univ. Math. J., 43(1994)913-925. In this paper it is also shown that the 
functions f(t) = t” on (0,00) belong to the class D“) if r > 2, but not if 
1<r< V2. We have already seen that these functions are in DW) for all 
real numbers r < 1. 

In Section X.4 we have given a bare outline of differential calculus. More 
on this may be found in J. Dieudonné, Foundations of Modern Analy- 
sis, Academic Press, 1960, and in A. Ambrosetti and G. Prodi, A Primer 
of Nonlinear Analysis, Cambridge University Press, 1993. For calculus on 
manifolds, the reader could see S. Lang, Introduction to Differentiable Man- 
ifolds, John Wiley, 1962. In our exposition we have included several exam- 
ples of matrix functions and formulae for higher derivatives of composite 
maps that are not easily found in other sources. 
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